Home  >  Article  >  Technology peripherals  >  The super evolved version of Meta "Divide Everything" is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

The super evolved version of Meta "Divide Everything" is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

WBOY
WBOYforward
2023-04-13 14:40:031804browse

After Meta’s “divide everything” model came out, people in the industry have already exclaimed that CV does not exist.

Just one day after SAM was released, the domestic team created an evolved version "Grounded-SAM" based on this.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

Note: The project logo was made by the team using Midjourney for an hour

Grounded-SAM integrates SAM with BLIP and Stable Diffusion, integrating the three capabilities of image "segmentation", "detection" and "generation" into one, becoming the most powerful Zero-Shot visual application.

Netizens expressed that it was too curly!

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

Wenhu Chen, a research scientist at Google Brain and an assistant professor of computer science at the University of Waterloo, said "This is too fast."

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

AI boss Shen Xiangyang also recommended this latest project to everyone:

Grounded- Segment-Anything: Automatically detect, segment and generate anything with image and text input. Edge segmentation can be further improved.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

So far, this project has garnered 2k stars on GitHub.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

Detect everything, split everything, generate everything

Last week, the release of SAM welcomed CV Here comes the GPT-3 moment. Even, Meta AI claims that this is the first basic image segmentation model in history.

This model can specify a point, a bounding box, and a sentence in a unified framework prompt encoder to directly segment any object with one click.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

SAM has broad versatility, that is, it has the ability to migrate with zero samples, which is enough to cover various use cases. With additional training, it can be used out of the box in new imaging domains, whether underwater photos or cell microscopy.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

It can be seen that SAM can be said to be extremely strong.

Now, domestic researchers have come up with new ideas based on this model. Combining the powerful zero-sample target detector Grounding DINO with it, it can detect and segment through text input. everything.

With the powerful zero-sample detection capability of Grounding DINO, Grounded SAM can find any object in the picture through text description, and then use SAM's powerful segmentation capability to segment out the objects in a fine-grained manner. mas.

Finally, you can also use Stable Diffusion to generate controllable text and images in the segmented areas.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

In the specific practice of Grounded-SAM, the researchers combined Segment-Anything with three powerful zero-sample models to build an automatic labeling system process and demonstrated a very, very impressive result!

This project combines the following models:

· BLIP: Powerful Image Annotation Model

· Grounding DINO: State-of-the-art zero-shot detector

· Segment-Anything: Powerful zero-shot segmentation Model

· Stable-Diffusion: Excellent generative model

All models can be combined used, or can be used independently. Build a powerful visual workflow model. The entire workflow has the ability to detect everything, segment everything, and generate everything.

Features of the system include:

BLIP Grounded-SAM=Automatic Labeler

Use the BLIP model to generate titles, extract tags, and use Ground-SAM to generate boxes and masks:

· Semi-automatic annotation system: Detection Input text and provide precise box annotation and mask annotation.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

· Fully automatic annotation system:

First use the BLIP model Generate reliable annotations for input images, then let Grounding DINO detect entities in the annotations, followed by SAM for instance segmentation on their box cues.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

##Stable Diffusion Grounded-SAM=Data Factory

· Used as a data factory to generate new data: Diffusion repair models can be used to generate new data based on masks. ​

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

Segment Anything HumanEditing

In this branch, the author uses Segment Anything to edit people's hair/face.

· SAM Hair Editor

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

· SAM Fashion Editor

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

##The author proposed some possible future research directions for the Grounded-SAM model:

Automatically generate images to build new datasets; a more powerful base model pre-trained for segmentation; collaboration with (Chat-)GPT models; a complete pipeline for automatically annotating images ( including bounding boxes and masks) and generate a new image.

Author introduction

One of the researchers of the Grounded-SAM project is Liu Shilong, a third-year doctoral student in the Department of Computer Science at Tsinghua University.

He recently introduced the latest project he and his team have made on GitHub, and said it is still being improved.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

Now, Liu Shilong is an intern at the Computer Vision and Robot Research Center of the Guangdong-Hong Kong-Macao Greater Bay Area Digital Economy Research Institute (IDEA Research Institute), guided by Professor Zhang Lei. His research directions include target detection and multi-modal learning.

Prior to this, he received a bachelor's degree in Industrial Engineering from Tsinghua University in 2020 and interned at Megvii for a period of time in 2019.

Personal homepage: ​http://www.lsl.zone/​

By the way, Liu Shilong It is also a work of the target detection model Grounding DINO released in March this year.

In addition, 4 of his papers were accepted by CVPR 2023, 2 papers were accepted by ICLR 2023, and 1 paper was accepted by AAAI 2023.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

##Paper address: https://arxiv.org/pdf/2303.05499.pdf

The big boss Liu Shilong mentioned, Ren Tianhe, is currently working as a computer vision algorithm engineer at the IDEA Research Institute. He is also guided by Professor Zhang Lei. His main research directions are target detection and multi-modality.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

# In addition, the project’s collaborators include Li Kunchang, a third-year doctoral student at the University of Chinese Academy of Sciences, whose main research directions are video understanding and multi-modal learning. ; Cao He, an intern at the Computer Vision and Robotics Research Center of the IDEA Research Institute, whose main research direction is generative models; and Chen Jiayu, a senior algorithm engineer at Alibaba Cloud.

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

## Ren Tianhe, Liu Shilong Install and run

The project requires installation of python 3.8 and above, pytorch 1.7 and above and torchvision 0.8 and above. In addition, the author strongly recommends installing PyTorch and TorchVision that support CUDA.

Install Segment Anything:

python -m pip install -e segment_anything

Install GroundingDINO:

python -m pip install -e GroundingDINO

Install diffusers:

pip install --upgrade diffusers[torch]

Install optional dependencies required for mask post-processing, saving masks in COCO format, example notebook, and exporting models in ONNX format. At the same time, the project also requires jupyter to run the example notebook.

pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

Grounding DINO DEMO

Download groundingdino checkpoint:

cd Grounded-Segment-Anything
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Run demo:

export CUDA_VISIBLE_DEVICES=0
python grounding_dino_demo.py 
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py 
--grounded_checkpoint groundingdino_swint_ogc.pth 
--input_image assets/demo1.jpg 
--output_dir "outputs" 
--box_threshold 0.3 
--text_threshold 0.25 
--text_prompt "bear" 
--device "cuda"

The model prediction visualization will be saved in output_dir as follows:

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars##Grounded-Segment- Anything BLIP Demonstration

Automatically generating pseudo-labels is simple:

1. Use BLIP (or other labeling models) to generate a label.

2. Extract tags from annotations and use ChatGPT to process potentially complex sentences.

3. Use Grounded-Segment-Anything to generate boxes and masks.

export CUDA_VISIBLE_DEVICES=0
python automatic_label_demo.py 
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py 
--grounded_checkpoint groundingdino_swint_ogc.pth 
--sam_checkpoint sam_vit_h_4b8939.pth 
--input_image assets/demo3.jpg 
--output_dir "outputs" 
--openai_key your_openai_key 
--box_threshold 0.25 
--text_threshold 0.2 
--iou_threshold 0.5 
--device "cuda"

伪标签和模型预测可视化将保存在output_dir中,如下所示:

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

Grounded-Segment-Anything+Inpainting演示

CUDA_VISIBLE_DEVICES=0
python grounded_sam_inpainting_demo.py 
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py 
--grounded_checkpoint groundingdino_swint_ogc.pth 
--sam_checkpoint sam_vit_h_4b8939.pth 
--input_image assets/inpaint_demo.jpg 
--output_dir "outputs" 
--box_threshold 0.3 
--text_threshold 0.25 
--det_prompt "bench" 
--inpaint_prompt "A sofa, high quality, detailed" 
--device "cuda"

Grounded-Segment-Anything+Inpainting Gradio APP

python gradio_app.py

作者在此提供了可视化网页,可以更方便的尝试各种例子。

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars


网友评论

对于这个项目logo,还有个深层的含义:

一只坐在地上的马赛克风格的熊。坐在地面上是因为ground有地面的含义,然后分割后的The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars可以认为是一种马赛克风格,而且马塞克谐音mask,之所以用熊作为logo主体,是因为作者主要示例的The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars是熊。

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

看到Grounded-SAM后,网友表示,知道要来,但没想到来的这么快。

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

项目作者任天和称,「我们用的Zero-Shot检测器是目前来说最好的。」

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

未来,还会有web demo上线。

The super evolved version of Meta Divide Everything is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars

最后,作者表示,这个项目未来还可以基于生成模型做更多的拓展应用,例如多领域精细化编辑、高质量可信的数据工厂的构建等等。欢迎各个领域的人多多参与。

The above is the detailed content of The super evolved version of Meta "Divide Everything" is here! IDEA leads the top domestic team to create: detect, segment, and generate everything, and grab 2k stars. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete