Meta’s Segment Anything Model 3

Have you seen Meta’s SAM(Segment Anything Model) 3?
I know I am critical about LLM(or even generic multi-modals) being shoved into anything, and I stand by that.
This, however, despite being able to take a text prompt, is no LLM.
This is YOLO’s ultimate evolution.
You can feed it a photo or video, then using either text prompts, bounding boxes, or just clicks, it will identify and outline the objects in the scene. To be clear, that is a fairly high-density polygon, NOT just a bounding box.
I am not sure I can even fathom the volume of training data that goes into a model like this.
Not only the variety of objects that are selected, but instead of just drawing bounding boxes around the objects, they are outlining them with incredible detail.
Think of it, 4 coords vs 1k coords(I am ballparking). That is a crazy increase in the density of the training data and output.
Meta is definitely making a play at AR with this, which I am excited to see.
The dark side of this is that Meta is going to add this to Instagram to allow users to augment the images and videos.
That likely won’t help the deep fake problems we have, not to mention the airbrushed social media distorting our youth’s perception of reality/beauty.
With all that said, I am in skeptical awe of this model and its capabilities.
As with anything, it's a tool, and we should be careful how we use it.
What are your thoughts on SAM3?