CVPR 2024 | Segmentation of all models has poor generalization ability of SAM? Domain adaptation strategy solved-AI-php.cn

Home

Technology peripherals

CVPR 2024 | Segmentation of all models has poor generalization ability of SAM? Domain adaptation strategy solved

王林

Apr 09, 2024 pm 04:55 PM

gitprojectMemory usage

The first domain adaptation strategy for the "Segment Anything" large model is here! Related papers have been accepted by CVPR 2024.

##Introduction

##大The success of language models (LLMs) has stimulated interest in exploring basic models for segmentation in the field of computer vision. These basic segmentation models are usually used for zero/few image segmentation through Prompt Engineer. Among them, Segment Anything Model (SAM) is the most advanced basic model for image segmentation.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

## 图

## However, recent research shows that SAM is not very robust and generalizable in a variety of downstream tasks, such as poor performance in medical images, camouflaged objects, and natural images with added interference. This may be due to a large
Domain Shift

between the training data set and the downstream test data set. Therefore, a very important question is, how to design a domain adaptation scheme to make SAM more robust in facing the real world and diverse downstream tasks?

There are three main challenges in adapting pre-trained SAM to downstream tasks:

First of all, the traditional unsupervised domain adaptation paradigm requires

and target dataset, which is relatively unfeasible due to privacy and computational cost.
Secondly, for domain adaptation, updating all weights usually performs better, but is also limited by
.
Finally, SAM can show diverse segmentation capabilities for prompts of different types and granularity, so
, unsupervised adaptation will be very challenging.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了 # . We use weak supervision to adapt SAM on various downstream tasks

To address the above challenges, we propose a method with

Weakly supervised self-training architecture

of anchor point regularization

and low-rank fine-tuning to improve the adaptive robustness and computational efficiency.

Specifically, we first adopt a self-training strategy in the passive domain to avoid dependence on source data. Self-training generates pseudo-labels, which are used to supervise model updates, but they are easily affected by incorrect pseudo-labels. We introduce

frozen source models as anchor networks
to standardize model updates.

To further reduce the high computational cost of updating the full model weights, we apply

low-rank weight decomposition
to the encoder and proceed via a low-rank shortcut path Backpropagation.

Finally, in order to further improve the effect of passive domain adaptation, we introduce

weak supervise
in the target domain, such as sparse dot annotation to provide stronger domain adaptation information, while this weak supervision is naturally compatible with the cue encoder in SAM.

With weak supervision as Prompt, we obtain more local and explicit self-trained pseudo-labels. The tuned model shows stronger generalization ability on multiple downstream tasks.

We summarize the contributions of this work as follows:

1. We suffer from the generalization problem of SAM in downstream tasks Inspired by , a solution that is task-agnostic and does not require source data is proposed to adapt SAM through self-training.

2. We use weak supervision, including box, point and other labels, to improve the adaptive effect. These weakly supervised labels are fully compatible with SAM's prompt encoder.

3. We conduct extensive experiments on 5 types of downstream instance segmentation tasks to demonstrate the effectiveness of the proposed weakly supervised adaptive method.

Paper address: https://arxiv.org/pdf/2312.03502.pdf
Project address: https://github.com/Zhang- Haojie/WeSAM
Paper title: Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation

Method

The method introduction is divided into four parts:

Segment Anything Model
Adaptive framework based on self-training
How weak supervision helps achieve effective self-training
Low rank weight update

##1.Segment Anything Model

SAM is mainly composed of three components:

Image Encoder (ImageEncoder), Prompt Encoder (PromptEncoder), and Decoder (MaskDecoder) .

The image encoder is pre-trained using MAE. The entire SAM is further fine-tuned on the training set SA-1B with 1.1 billion annotations. Focal loss and Dice are used during training. combination of loss. At inference time, a test image x is first encoded by an image encoder, and then given a prompt, a lightweight decoder makes three levels of predictions.

2.Source-Free Domain Adaptation Self-Training

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

Figure 2 The proposed self-training architecture with anchor network regularization and contrastive loss regularization

For target datasets where no labels are provided DT={xi} and pre-trained segmentation model. We use the

student-teacher architecture for self-training. As shown in Figure 2, we maintain three encoder networks, namely anchor model, student model, and teacher model, where the student and teacher models share weights.

Specifically, for each sample xi, apply a random weak data enhancement as the input of the anchor and teacher models, and apply a random strong data enhancement as the student model As input, three encoder networks encode to produce three feature maps.

In the decoder network, given a certain number Np of prompts, such as box, point or coarse mask, a set of instance segmentation masks will be inferred.

#Based on the above knowledge, we elaborate on the three sets of optimization objectives for self-training below.

1) Student-Teacher self-training

We first use the same loss function as the self-training when training SAM Train the optimization objective to update the student/teacher model. Self-training is widely used in semi-supervised learning and has recently been shown to be very effective for passive domain adaptation. Specifically, we use the prediction results generated by the teacher model as pseudo labels, and use Focal loss and Dice loss to supervise the student output.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

2) Anchor loss for robust regularization

Network training using only self-training loss is susceptible to The effect of the accumulation of false pseudo-labels predicted by the teacher network, the so-called confirmation bias. Observations also show that performance degrades after long iterations using only self-training. Existing passive domain adaptation methods often employ additional constraints to prevent the negative effects of self-training, such as uniform distribution of predictions.

We regularize through anchor loss, as shown in Formula 3,

minimizes the Dice loss between anchor model and student/teacher model respectively. The frozen anchor model, as knowledge inherited from the source domain, discourages excessive deviations between the source model and the self-training update model, and can prevent model collapse.

3) Contrast loss regularized encoder feature space CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

^{图3 两个分支下的对比损失}

以上两个训练目标is performed in the output space of the decoder. The experimental section reveals that updating the encoder network is the most efficient way to adapt SAM, so it is necessary to directly apply regularization to the features output from the encoder network. As shown in Figure 3, we crop the features of each instance from the feature map based on the predicted mask in the anchor and teacher branches. CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

We further define the positive and negative sample pairs in the contrastive loss. The positive sample pairs are constructed from the instance features corresponding to the same prompt in the two branches, and the negative sample pairs It is constructed by the instance characteristics corresponding to different prompts. The final contrast loss is shown below, where

is the temperature coefficient. CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

4) Total loss

We combine the above three loss functions into the final Source-Free adaptation loss.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

3. Self-trained Prompt generation

SAM segmentation requires Prompt input to indicate the target object to be segmented, but there may be particles. A vague question. Prompt projects can be implemented in a fully automated manner or through human interaction.

1) Completely automatically generate prompt

We first use grid dense sampling points as prompt input, through Anchor The model generates masks for segmentation in the initial stage, eliminates masks with low IoU and stability scores, and then performs non-maximum suppression to obtain the segmentation results. Next, a fixed set of prompts is generated from the final masks as prompt input for all three branches. Therefore, the mask lengths of the three network segmentation outputs are the same and have an exact one-to-one correspondence.

2) Weak supervision as prompts

Although prompts can be obtained by using grid sampling on the image, and Filter out low-quality and duplicate masks for automatic segmentation. But these segmentations are of relatively poor quality, may contain many false positive predictions, and have unclear granularity. The resulting prompt quality is uneven, making self-training less effective.

Therefore, drawing on previous weakly supervised domain adaptation work, we propose to use three weakly supervised methods, including bounding box box, sparse point annotation point and coarse segmentation polygon coarse mask. In SAM, these weak supervision methods perfectly match prompt input, and weak supervision can be seamlessly integrated to adapt to SAM.

4. Low-rank weight update

The huge encoder network of the basic model makes It becomes extremely difficult to update the weights of all models. However, many existing studies show that updating the encoder network weights is an effective way to tune pre-trained models.

#To be able to update the encoder network more efficiently and cost-effectively, we choose a computationally friendly low-rank update method. For each weight θ in the encoder network, we use a low-rank approximation ω = AB and set a compression ratio r. Only A and B are updated via backpropagation to reduce memory usage. During the inference phase, the weights are reconstructed by combining the low-rank approximation with the original weights, i.e., θ = θ AB.

Experiments

In the experiments, we provide detailed details with the state-of-the-art methods Comparative and qualitative results. Finally, we analyze the effectiveness of each part and the specific design of the network.

1. Dataset

In this work, we Different types of downstream segmentation tasks are evaluated, some of which have significant distribution shifts from SA-1B. The dataset covers clear natural images, natural images with added interference, medical images, camouflaged objects and robot images, a total of 10 types.

Data partitioning: Each downstream data set is divided into non-overlapping training sets and test sets.

The datasets on which each type of downstream task was evaluated are listed in Table 1, along with the split of the training and test datasets.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

2. Experimental details

Segment-Anything model: Due to memory limitations, we ViT-B is adopted as the encoder network. Use standard hint encoder and mask decoder.

Prompt generation: Prompt inputs for both training and evaluation phases are computed from instance segmentation GT masks, simulating human interaction as weak supervision.

#Specifically, we extract the box from the minimum bounding box of the entire GT mask. Points are created by randomly selecting 5 positive sample points within the GT mask and 5 negative sample points outside the mask. Coarse masks are simulated by fitting polygons to GT masks.

3. Experimental results

Tables 2, 3, 4, and 5 are respectively Test results on natural images with added interference, clear natural images, medical images, and camouflaged object data sets. The complete experimental results can be found in the paper. Experiments demonstrate that our scheme outperforms pre-trained SAM and state-of-the-art domain adaptation schemes on almost all downstream segmentation datasets.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

4. Visualization results

Part of the visualization results are as follows As shown in Figure 4, more visualization results can be found in the paper.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

^{Figure 4 Visualized results of some examples}

5. Ablation experiments and additional analysis

We analyzed the effectiveness of each of the three self-training optimization objectives on the COCO data set, as shown in Table 7. In Table 7, we also analyze the effect of the proposed method on adaptation without using any weak supervision information.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

We analyzed the performance differences between training and testing using different categories of prompts, as shown in Table 8. Experiments show that our scheme still performs well under cross-prompt conditions.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

In addition, we also analyzed the experimental results of optimizing different modules, including decoders, LayerNorm and different finetune schemes and their combinations. The experiments proved the performance of the finetune encoder. The LoRA scheme works best.

CVPR 2024 | 分割一切模型SAM泛化能力差？域适应策略给解决了

Summary

Although basic vision models can perform well on segmentation tasks, It will still suffer from poor performance in downstream tasks. We study the generalization ability of the Segment-Anything model in multiple downstream image segmentation tasks and propose a self-training method based on anchor regularization and low-rank fine-tuning. This method does not require access to the source data set, has low memory cost, is naturally compatible with weak supervision, and can significantly improve the adaptive effect. After extensive experimental verification, the results show that our proposed domain adaptation method can significantly improve the generalization ability of SAM under various distribution shifts.

The above is the detailed content of CVPR 2024 | Segmentation of all models has poor generalization ability of SAM? Domain adaptation strategy solved. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete

Gemma Scope: Google's Microscope for Peering into AI's Thought ProcessApr 17, 2025 am 11:55 AM

Exploring the Inner Workings of Language Models with Gemma Scope Understanding the complexities of AI language models is a significant challenge. Google's release of Gemma Scope, a comprehensive toolkit, offers researchers a powerful way to delve in

Who Is a Business Intelligence Analyst and How To Become One?Apr 17, 2025 am 11:44 AM

Unlocking Business Success: A Guide to Becoming a Business Intelligence Analyst Imagine transforming raw data into actionable insights that drive organizational growth. This is the power of a Business Intelligence (BI) Analyst – a crucial role in gu

How to Add a Column in SQL? - Analytics VidhyaApr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Business Analyst vs. Data AnalystApr 17, 2025 am 11:38 AM

Introduction Imagine a bustling office where two professionals collaborate on a critical project. The business analyst focuses on the company's objectives, identifying areas for improvement, and ensuring strategic alignment with market trends. Simu

What are COUNT and COUNTA in Excel? - Analytics VidhyaApr 17, 2025 am 11:34 AM

Excel data counting and analysis: detailed explanation of COUNT and COUNTA functions Accurate data counting and analysis are critical in Excel, especially when working with large data sets. Excel provides a variety of functions to achieve this, with the COUNT and COUNTA functions being key tools for counting the number of cells under different conditions. Although both functions are used to count cells, their design targets are targeted at different data types. Let's dig into the specific details of COUNT and COUNTA functions, highlight their unique features and differences, and learn how to apply them in data analysis. Overview of key points Understand COUNT and COU

Chrome is Here With AI: Experiencing Something New Everyday!!Apr 17, 2025 am 11:29 AM

Google Chrome's AI Revolution: A Personalized and Efficient Browsing Experience Artificial Intelligence (AI) is rapidly transforming our daily lives, and Google Chrome is leading the charge in the web browsing arena. This article explores the exciti

AI's Human Side: Wellbeing And The Quadruple Bottom LineApr 17, 2025 am 11:28 AM

Reimagining Impact: The Quadruple Bottom Line For too long, the conversation has been dominated by a narrow view of AI’s impact, primarily focused on the bottom line of profit. However, a more holistic approach recognizes the interconnectedness of bu

5 Game-Changing Quantum Computing Use Cases You Should Know AboutApr 17, 2025 am 11:24 AM

Things are moving steadily towards that point. The investment pouring into quantum service providers and startups shows that industry understands its significance. And a growing number of real-world use cases are emerging to demonstrate its value out

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.