


Just now, Meta AI released Segment Anything Model (SAM) - the first basic model for image segmentation.
SAM can achieve one-click segmentation of any object from photos or videos, and can migrate to other tasks with zero samples.
Overall, SAM follows the idea of the basic model:
1. A very Simple yet scalable architecture that can handle multi-modal cues: text, keypoints, bounding boxes.
2. Intuitive annotation process, closely connected with model design.
3. A data flywheel that allows the model to be bootstrapped to a large number of unlabeled images.
And, it is no exaggeration to say that SAM has learned the general concept of "object", even for unknown objects, unfamiliar scenes (such as underwater and under microscopes), and blurry The same is true for the case.
In addition, SAM can also be generalized to new tasks and new fields, and practitioners no longer need to fine-tune the model themselves.
Paper address: https://ai.facebook.com/research/publications/segment-anything/
The most powerful thing is that Meta implements a completely different CV paradigm. You can specify a point, a bounding box, and a sentence in a unified framework prompt encoder to directly segment objects with one click.
In this regard, Tencent AI algorithm expert Jin Tian said, "The prompt paradigm in the NLP field has begun to extend to the CV field. This time, it may completely change the traditional prediction thinking of CV. . Now you can really use a model to segment any object, and it is dynamic!"
NVIDIA AI scientist Jim Fan even praised this: We are already here It’s the “GPT-3 moment” in the field of computer vision!
So, CV really doesn’t exist anymore?
SAM: "Cut out" all objects in any image with one click
Segment Anything is the first basic model dedicated to image segmentation.
Segmentation refers to identifying which image pixels belong to an object and has always been the core task of computer vision.
However, if you want to create an accurate segmentation model for a specific task, it usually requires highly specialized work by experts. This process requires an infrastructure for training AI and a large number of carefully annotated domains. Data, so the threshold is extremely high.
In order to solve this problem, Meta proposed a basic model for image segmentation-SAM. This hintable model, trained on diverse data, is not only adaptable to a variety of tasks, but also operates similarly to how hints are used in NLP models.
The SAM model grasps the concept of "what is an object" and can generate a mask for any object in any image or video, even objects it has not seen during training.
SAM is so versatile that it covers a variety of use cases and can be used in new imaging domains out of the box without additional training, whether it's underwater photos, Or a cell microscope. In other words, SAM already has the capability of zero-sample migration.
Meta said excitedly in the blog: It can be expected that in the future, SAM will be used in any application that needs to find and segment objects in images.
SAM can become part of a larger AI system to develop a more general multi-modal understanding of the world, for example, understanding the visual and textual content of web pages.
In the field of AR/VR, SAM can select objects based on the user’s line of sight and then “upgrade” the objects to 3D.
For content creators, SAM can extract image areas for collage, or video editing.
SAM can also locate and track animals or objects in videos, which is helpful for natural science and astronomy research.
General segmentation method
In the past, there were two methods to solve the segmentation problem.
One is interactive segmentation, which can segment objects of any category, but requires a person to fine-tune the mask through iteration.
The second is automatic segmentation, which can segment specific objects defined in advance, but the training process requires a large number of manually labeled objects (for example, to segment a cat, thousands of example).
In short, neither of these two methods can provide a universal, fully automatic segmentation method.
And SAM can be seen as a generalization of these two methods, and it can easily perform interactive segmentation and automatic segmentation.
On the model's promptable interface, a wide range of segmentation tasks can be completed by simply designing the correct prompts (clicks, boxes, text, etc.) for the model.
Additionally, SAM is trained on a diverse, high-quality dataset containing over 1 billion masks, allowing the model to generalize to new objects and images beyond its capabilities. What was observed during training. As a result, practitioners no longer need to collect their own segmentation data to fine-tune models for use cases.
This kind of flexibility that can be generalized to new tasks and new fields is the first time in the field of image segmentation.
(1) SAM allows users to segment objects with one click, or interactively click many points, and can also use bounding box hints for the model.
(2) When faced with the ambiguity of segmented objects, SAM can output multiple valid masks, which is an essential capability for solving segmentation problems in the real world.
(3) SAM can automatically discover and block all objects in the image. (4) After precomputing image embeddings, SAM can generate segmentation masks for any prompt in real time, allowing users to interact with the model in real time.
How it works
The SAM trained by the researchers can return valid segmentation masks for any prompt. Cues can be foreground/background points, rough boxes or masks, free-form text, or generally any information that indicates that segmentation is needed in the image.
The requirement for effective masking simply means that even in cases where the prompt is ambiguous and may refer to multiple objects (e.g., a dot on a shirt may represent either the shirt or the person wearing the shirt ) , the output should be a reasonable mask of one of the objects.
The researchers observed that pre-training tasks and interactive data collection impose specific constraints on model design. constraint.
In particular, the model needs to run in real time on the CPU in a web browser so that standard staff can efficiently interact with SAM in real time for annotation.
While runtime constraints mean there is a trade-off between quality and runtime, the researchers found that in practice, simple designs can achieve good results.
SAM's image encoder produces one-time embeddings for images, while the lightweight decoder converts any hints into vector embeddings on the fly. These two sources of information are then combined in a lightweight decoder that predicts segmentation masks.
After calculating the image embedding, SAM can generate a segment of the image in just 50 milliseconds and give any prompt in the web browser.
The latest SAM model was trained on 256 A100 images for 68 hours (nearly 5 days).
Project demonstration
Multiple input prompts
Prompts for specifying the content to be divided in the image, Various segmentation tasks can be implemented without additional training.
##Use interaction points and boxes as prompts
Automatically segment all elements in the image
Generate multiple valid masks for ambiguous prompts
Promptable design
SAM can accept input prompts from other systems.
For example, select the corresponding object based on the user's visual focus information transmitted from the AR/VR headset. Meta's development of AI that can understand the real world will pave the way for its future metaverse journey.
Alternatively, implement text-to-object segmentation using bounding box hints from the object detector.
Scalable output
The output mask can be used as input to other AI systems.
For example, the mask of an object can be tracked in a video, turned into 3D through imaging editing applications, or used for creative tasks such as collage.
Zero-sample generalization
SAM learned A general idea of what an object is - this understanding enables zero-shot generalization to unfamiliar objects and images without the need for additional training.
Select Hover&Click, click Add Mask and a green dot will appear, click Remove Area and a red dot will appear , the apple-eating Huahua was immediately circled.
#After clicking Everything, all objects recognized by the system are extracted immediately. After choosing Cut-Outs, you will get a triangular dumpling in seconds. In addition to the new models released, Meta Also released is SA-1B, the largest segmentation dataset to date. This dataset consists of 11 million diverse, high-resolution, privacy-preserving images, and 1.1 billion high-quality segmentation masks. The overall characteristics of the data set are as follows: · Total number of images: 11 million · Total number of masks: 1.1 billion · Average masks per image: 100 · Average image resolution: 1500 × 2250 pixels Note: Image or mask annotations do not have class tags Meta specifically emphasizes that these data are collected through our data engine, all Masks are all fully automatically generated by SAM. With the SAM model, collecting new segmentation masks is faster than ever, and interactively annotating a mask only takes about 14 seconds. The per-mask annotation process is only 2 times slower than annotating bounding boxes. Using the fastest annotation interface, annotating bounding boxes takes about 7 seconds. Compared to previous large-scale segmentation data collection efforts, SAM model COCO’s fully manual polygon-based mask annotation is 6.5 times faster than the previous largest data annotation effort (also model Auxiliary) 2 times faster. However, relying on interactive annotation masks is not enough to create more than 1 billion masked data set. Therefore, Meta built a data engine for creating SA-1B datasets. This data engine has three "gears": 1. Model auxiliary annotation 2. The mixture of fully automatic annotation and auxiliary annotation helps to increase the diversity of collected masks 3. Fully automatic mask creation enables the expansion of the data set Our final dataset includes over 1.1 billion segmentation masks collected on approximately 11 million authorized and privacy-preserving images. SA-1B has 400x more masks than any existing segmentation dataset. And human evaluation studies confirm that the masks are of high quality and diversity, and in some cases are even qualitatively comparable to previous masks from smaller, fully manually annotated datasets. ## Pictures of the SA-1B were obtained through photo providers from multiple countries, These countries span different geographic regions and income levels. While some geographic areas are still underrepresented, SA-1B has more images and better overall representation across all regions than previous segmentation datasets. Finally, Meta says it hopes this data can form the basis of new datasets that include additional annotations, such as textual descriptions associated with each mask. Ross Girshick ##Ross Girshick (often called the RBG guru) is a research scientist at the Facebook Artificial Intelligence Research Institute (FAIR), where he is committed to computer vision and machine learning research. In 2012, Ross Girshick received his PhD in Computer Science from the University of Chicago under the supervision of Pedro Felzenszwalb. Before joining FAIR, Ross was a researcher at Microsoft Research and a postdoc at the University of California, Berkeley, where his mentors were Jitendra Malik and Trevor Darrell. He received the 2017 PAMI Young Researcher Award and the 2017 and 2021 PAMI Mark Everingham Awards in recognition of his contributions to open source software. As we all know, Ross and He Kaiming jointly developed the target detection algorithm of the R-CNN method. In 2017, the Mask R-CNN paper by Ross and He Kaiming won the best paper in ICCV 2017. Meta created this segmentation basic model in the CV field, which made many netizens shout, “Now, CV really doesn’t exist. Exists." Meta scientist Justin Johnson said: "To me, Segment Anything's data engine and ChatGPT's RLHF represent the largest A new era of artificial intelligence. Instead of learning everything from noisy network data, it is better to cleverly apply human annotation combined with big data to unlock new capabilities. Supervised learning is back!" #The only regret is that the SAM model release was mainly led by Ross Girshick, but He Yuming was absent. Intimate friend "matrix Mingzi" said that this article further proves that multimodality is CV There is no tomorrow for pure CV. SA-1B dataset: 11 million images, 1.1 billion masks
RBG master leads the team
The above is the detailed content of Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV. For more information, please follow other related articles on the PHP Chinese website!
![Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]](https://img.php.cn/upload/article/001/242/473/174717025174979.jpg?x-oss-process=image/resize,p_40)
ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Dreamweaver CS6
Visual web development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

SublimeText3 Chinese version
Chinese version, very easy to use
