search
HomeTechnology peripheralsAIMastering Image and Video Segmentation with SAM 2

This guide will walk you through what  Segment Anything Model 2  is, how it works, and how you’ll utilize it to portion objects in pictures and videos. It offers state-of-the-art execution and adaptability in fragmenting objects into pictures, making it an important resource for a assortment of computer vision applications. This directly points to supplying a nitty-gritty, step-by-step walkthrough for setting up and utilizing SAM 2 to perform picture division. By taking this direct, you will be able to produce division covers for pictures utilizing both box and point prompts.

Learning Objectives

  • Describe the key features and applications of the Segment Anything Model 2 SAM 2 in image and video segmentation.
  • Successfully configure a CUDA-enabled environment, install necessary dependencies, and clone the Segment Anything Model 2 repository for image segmentation tasks.
  • Apply SAM 2 to generate segmentation masks for images using both box and point prompts and visualize the results effectively.
  • Evaluate how SAM 2 can revolutionize photo and video editing by enabling real-time segmentation, automating complex tasks, and democratizing content creation for a broader audience.

This article was published as a part of the Data Science Blogathon.

Table of contents

  • Prerequisites
  • What is SAM 2?
  • Setting Up and Utilizing SAM 2 for Image Segmentation
  • Key Points to Remember When Working SAM 2
  • Impressive Potential of SAM 2
  • Conclusion
  • Frequently Asked Questions

Prerequisites

Some time recently you begin, guarantee you’ve got a CUDA-enabled GPU for quicker handling. Also, verify that you have Python installed on your machine. This guide assumes you have some basic knowledge of Python and image processing concepts.

What is SAM 2?

Segment Anything Model 2 is an progressed instrument for picture division created by Facebook AI Inquire about (Reasonable). On July 29th, 2024, Meta AI discharged SAM 2, an progressed picture and video division establishment show. SAM 2 empowers clients to supply focuses or boxes in an picture or video to create division covers for particular objects.

Click here to access it

Key Features of SAM 2

  • Advanced Mask Generation: SAM 2 generates high-quality segmentation masks based on user inputs, such as points or bounding boxes.
  • Flexibility: The model supports both image and video segmentation.
  • Speed and Efficiency: With CUDA support, SAM 2 can perform segmentation tasks rapidly, making it suitable for real-time applications.

Core Components of SAM 2

  • Image Encoder: Encodes the input image for processing.
  • Prompt Encoder: Converts user-provided points or boxes into a format the model can use.
  • Mask Decoder: Generates the final segmentation mask based on the encoded inputs.

Applications of SAM 2

Let us now look into the applications of SAM 2 below:

  • Photo and Video Editing: SAM 2 allows for precise object segmentation, enabling detailed edits and creative effects in photos and videos.
  • Autonomous Vehicles: In autonomous driving, SAM 2 can be used to identify and track objects like pedestrians, vehicles, and road signs in real-time.
  • Medical Imaging: SAM 2 can assist in segmenting anatomical structures in medical images, aiding in diagnostics and treatment planning.

What is Image Segmentation?

Image segmentation is a computer vision technique that involves dividing an image into multiple segments or regions to simplify its analysis. Each segment represents a different object or part of an object within the image, making it easier to identify and analyze specific elements.

Types of Image Segmentation

  • Semantic Segmentation: Classifies each pixel into a predefined category.
  • Instance Segmentation: Differentiates between different instances of the same object category.
  • Panoptic Segmentation: Combines semantic and instance segmentation.

Setting Up and Utilizing SAM 2 for Image Segmentation

We’ll guide you through the process of setting up the Segment Anything Model 2 (SAM 2) in your environment and utilizing its powerful capabilities for precise image segmentation tasks. From ensuring your GPU is ready to configuring the model and applying it to real images, each step will be covered in detail to help you harness the full potential of SAM 2.

Step 1: Check GPU Availability and Set Up the Environment

First, let’s ensure that your environment is properly set up, starting with checking for GPU availability and setting the current working directory.

# Check GPU availability and CUDA version
!nvidia-smi
!nvcc --version

# Import necessary modules
import os

# Set the current working directory
HOME = os.getcwd()
print("HOME:", HOME)

Explanation

  • !nvidia-smi and !nvcc –version: These commands check if your framework incorporates a CUDA-enabled GPU and show the CUDA form.
  • os.getcwd(): This work gets the current working catalog, which can be utilized for overseeing record ways.

Step 2: Clone the SAM 2 Repository and Install Dependencies

Next, we need to clone the SAM 2 repository from GitHub and install the required dependencies.

# Clone the SAM 2 repository
!git clone https://github.com/facebookresearch/segment-anything-2.git

# Change to the repository directory
%cd segment-anything-2

# Install the SAM 2 package
!pip install -e .

# Install additional packages
!pip install supervision jupyter_bbox_widget

Explanation

  • !git clone: Clones the SAM 2 repository to your local machine.
  • %cd: Changes the directory to the cloned repository.
  • !pip install -e .: Installs the SAM 2 package in editable mode.
  • !pip install supervision jupyter_bbox_widget: Installs additional packages required for visualization and bounding box widget support.

Step 3: Download Model Checkpoints

Model checkpoints are essential, as they contain the trained parameters of SAM 2. We will download multiple checkpoints for different model sizes.

# Create a directory for checkpoints
!mkdir -p checkpoints

# Download the model checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt -P checkpoints

Explanation

  • !mkdir -p checkpoints: Creates a directory for storing model checkpoints.
  • !wget -q … -P checkpoints: Downloads the model checkpoints into the checkpoints directory. Different checkpoints represent models of varying sizes and capabilities.

Step 4: Download Sample Images

For demonstration purposes, we’ll use some sample images. You can also use your images by following similar steps.

# Create a directory for data
!mkdir -p data

# Download sample images
!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg -P data

Explanation

  • !mkdir -p data: Creates a directory for storing sample images.
  • !wget -q … -P data: Downloads the sample images into the data directory.

Step 5: Set Up the SAM 2 Model and Load an Image

Now, we will set up the SAM 2 model, load an image, and prepare it for segmentation.

import cv2
import torch
import numpy as np
import supervision as sv

from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator

# Enable CUDA if available
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()

if torch.cuda.get_device_properties(0).major >= 8:
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

# Set the device to CUDA
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define the model checkpoint and configuration
CHECKPOINT = "checkpoints/sam2_hiera_large.pt"
CONFIG = "sam2_hiera_l.yaml"

# Build the SAM 2 model
sam2_model = build_sam2(CONFIG, CHECKPOINT, device=DEVICE, apply_postprocessing=False)

# Create the automatic mask generator
mask_generator = SAM2AutomaticMaskGenerator(sam2_model)

# Load an image for segmentation
IMAGE_PATH = "/content/WhatsApp Image 2024-08-02 at 14.17.11_2b223e01.jpg"
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Generate segmentation masks
sam2_result = mask_generator.generate(image_rgb)

Explanation

  • CUDA Setup: Enables CUDA for faster processing and sets the device to GPU if available.
  • Model Setup: Builds the SAM 2 model using the specified configuration and checkpoint.
  • Image Loading: Loads and converts the sample image to RGB format.
  • Mask Generation: Uses the automatic mask generator to generate segmentation masks for the loaded image.

Step 6: Visualize the Segmentation Masks

We will now visualize the segmentation masks generated by SAM 2.

# Annotate the masks on the image
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections.from_sam(sam_result=sam2_result)
annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the original and segmented images side by side
sv.plot_images_grid(
    images=[image_bgr, annotated_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

Mastering Image and Video Segmentation with SAM 2

# Extract and plot individual masks
masks = [
    mask['segmentation']
    for mask in sorted(sam2_result, key=lambda x: x['area'], reverse=True)
]

sv.plot_images_grid(
    images=masks[:16],
    grid_size=(4, 4),
    size=(12, 12)
)

Mastering Image and Video Segmentation with SAM 2

Explanation:

  • Mask Annotation: Annotates the segmentation masks on the original image.
  • Visualization: Plots the original and segmented images side by side and also plots individual masks.

Step7: Use Box Prompts for Segmentation

Box prompts allow us to specify regions of interest in the image for segmentation.

# Define the SAM 2 Image Predictor
predictor = SAM2ImagePredictor(sam2_model)

# Reload the image
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Encode the image for bounding box input
import base64

def encode_image(filepath):
    with open(filepath, 'rb') as f:
        image_bytes = f.read()
    encoded = str(base64.b64encode(image_bytes), 'utf-8')
    return "data:image/jpg;base64,"+encoded

# Enable custom widget manager in Colab
IS_COLAB = True

if IS_COLAB:
    from google.colab import output
    output.enable_custom_widget_manager()

from jupyter_bbox_widget import BBoxWidget

# Create a bounding box widget
widget = BBoxWidget()
widget.image = encode_image(IMAGE_PATH)

# Display the widget
widget

Mastering Image and Video Segmentation with SAM 2

Explanation

  • Image Predictor: Defines the SAM 2 image predictor.
  • Image Encoding: Encodes the image for use with the bounding box widget.
  • Widget Setup: Sets up a bounding box widget for specifying regions of interest.

Step8: Get Bounding Boxes and Perform Segmentation

After specifying the bounding boxes, we can use them to generate segmentation masks.

# Get the bounding boxes from the widget
boxes = widget.bboxes
boxes = np.array([
    [
        box['x'],
        box['y'],
        box['x'] + box['width'],
        box['y'] + box['height']
    ] for box in boxes
])
[{'x': 457, 'y': 341, 'width': 0, 'height': 0, 'label': ''},
 {'x': 205, 'y': 79, 'width': 0, 'height': 1, 'label': ''}]
# Set the image in the predictor
predictor.set_image(image_rgb)

# Generate masks using the bounding boxes
masks, scores, logits = predictor.predict(
    box=boxes,
    multimask_output=False
)

# Convert masks to binary format
masks = np.squeeze(masks)

# Annotate and visualize the masks
box_annotator = sv.BoxAnnotator(color=sv.Color.white())
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)

detections = sv.Detections(
    xyxy=boxes,
    mask=masks.astype(bool)
)

source_image = box_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the annotated images
sv.plot_images_grid(
    images=[source_image, segmented_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

Mastering Image and Video Segmentation with SAM 2

Explanation

  • Bounding Boxes: Retrieves the bounding boxes specified using the widget.
  • Mask Generation: Uses the bounding boxes to generate segmentation masks.
  • Visualization: Annotates and visualizes the masks on the original image.

Step9: Use Point Prompts for Segmentation

Point prompts allow us to specify individual points of interest for segmentation.

# Create point prompts based on bounding boxes
input_point = np.array([
    [
        box['x'] + (box['width'] // 2),
        box['y'] + (box['height'] // 2)
    ] for box in widget.bboxes
])
input_label = np.array([1] * len(input_point))

# Generate masks using the point prompts
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True
)

# Convert masks to binary format
masks = np.squeeze(masks)

# Annotate and visualize the masks
point_annotator = sv.PointAnnotator(color_lookup=sv.ColorLookup.INDEX)
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)

detections = sv.Detections(
    xyxy=sv.mask_to_xyxy(masks=masks),
    mask=masks.astype(bool)
)

source_image = point_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the annotated images
sv.plot_images_grid(
    images=[source_image, segmented_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

Mastering Image and Video Segmentation with SAM 2

Explanation

  • Point Prompts: Creates point prompts based on the bounding boxes.
  • Mask Generation: Uses the point prompts to generate segmentation masks.
  • Visualization: Annotates and visualizes the masks on the original image.

Key Points to Remember When Working SAM 2

Let us now look into few important key points below:

Revolutionizing Photo and Video Editing

  • Potential to transform the photo and video editing industry.
  • Future enhancements may include improved precision, lower computational requirements, and advanced AI integration.

Real-Time Segmentation and Editing

  • Evolution could lead to real-time segmentation and editing capabilities.
  • Allows seamless alterations in videos and images with minimal effort.

Creative Possibilities for All

  • Opens up new creative possibilities for both professionals and amateurs.
  • Simplifies the manipulation of visual content, the creation of stunning effects, and the production of high-quality media.

Automating Complex Tasks

  • Automates intricate segmentation tasks.
  • Significantly accelerates workflows, making sophisticated editing more accessible and efficient.

Democratizing Content Creation

  • Makes high-level editing tools available to a broader audience.
  • Empowers storytellers and inspires innovation across various sectors, including entertainment, advertising, and education.

Impact on VFX Industry

  • Enhances visual effects (VFX) production by streamlining complex processes.
  • Reduces the time and effort required for creating intricate VFX, enabling more ambitious projects and improving overall quality.

Impressive Potential of SAM 2

The Segment Anything Model 2 (SAM 2) stands poised to revolutionize the fields of photo and video editing by introducing significant advancements in precision and computational efficiency. By integrating advanced AI capabilities, SAM 2 will enable more intuitive user interactions and real-time segmentation and editing, allowing seamless alterations with minimal effort. This groundbreaking technology promises to democratize content creation, empowering both professionals and amateurs to manipulate visual content, create stunning effects, and produce high-quality media with ease.

As SAM 2 automates complex segmentation tasks, it will accelerate workflows and make sophisticated editing accessible to a wider audience. This transformation will inspire innovation across various industries, from entertainment and advertising to education. In the realm of visual effects (VFX), SAM 2 will streamline intricate processes, reducing the time and effort needed to create elaborate VFX. This will enable more ambitious projects, elevate the quality of visual storytelling, and open up new creative possibilities in the VFX world.

Conclusion

By following this guide, you have learned how to set up and use the Segment Anything Model 2 (SAM 2) for image segmentation using both box and point prompts. SAM 2 provides powerful and flexible tools for segmenting objects in images, making it a valuable asset for various computer vision tasks. Feel free to experiment with your images and explore the capabilities of SAM 2 further.

Key Takeaways

  • SAM 2 is an advanced tool developed by Meta AI that enables precise and flexible image and video segmentation using both box and point prompts.
  • The model can significantly enhance photo and video editing by automating complex segmentation tasks, making it more accessible and efficient.
  • Setting up SAM 2 requires a CUDA-enabled GPU and a basic understanding of Python and image processing concepts.
  • SAM 2’s capabilities open new possibilities for both professionals and amateurs in content creation, offering real-time segmentation and creative control.
  • The model has the potential to transform various industries, including visual effects, entertainment, advertising, and education, by democratizing high-level editing tools.

Frequently Asked Questions

Q1. What is SAM 2?

A. SAM 2, or Section Anything Show 2, is a picture and video division show created by Meta AI that permits clients to produce division covers for particular objects by giving box or point prompts.

Q2. What are the prerequisites for utilizing SAM 2?

A. To use SAM 2, you need a CUDA-enabled GPU for faster processing and Python installed on your machine. Basic knowledge of Python and image processing concepts is also helpful.

Q3. How do I set up SAM 2?

A. Set up SAM 2 by checking GPU availability, cloning the SAM 2 repository from GitHub, installing required dependencies, and downloading model checkpoints and sample images for testing.

Q4. What types of prompts can be used with SAM 2 for segmentation?

A. SAM 2 supports both box prompts and point prompts. Box prompts involve specifying regions of interest using bounding boxes, while point prompts involve selecting specific points in the image.

Q5. How can SAM 2 impact photo and video editing?

A. SAM 2 can revolutionize photo and video altering by mechanizing complex division assignments, empowering real-time altering, and making advanced altering apparatuses available to a broader gathering of people, in this manner improving imaginative conceivable outcomes and workflow proficiency.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

The above is the detailed content of Mastering Image and Video Segmentation with SAM 2. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
A Comprehensive Guide to ExtrapolationA Comprehensive Guide to ExtrapolationApr 15, 2025 am 11:38 AM

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

The Rise Of Soft AI And What It Means For Businesses TodayThe Rise Of Soft AI And What It Means For Businesses TodayApr 15, 2025 am 11:36 AM

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

Evolving Security Frameworks For The AI FrontierEvolving Security Frameworks For The AI FrontierApr 15, 2025 am 11:34 AM

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!Apr 15, 2025 am 11:33 AM

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

New Short Course on Embedding Models by Andrew NgNew Short Course on Embedding Models by Andrew NgApr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Is Hallucination in Large Language Models (LLMs) Inevitable?Is Hallucination in Large Language Models (LLMs) Inevitable?Apr 15, 2025 am 11:31 AM

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

The 60% Problem — How AI Search Is Draining Your TrafficThe 60% Problem — How AI Search Is Draining Your TrafficApr 15, 2025 am 11:28 AM

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

MIT Media Lab To Put Human Flourishing At The Heart Of AI R&DMIT Media Lab To Put Human Flourishing At The Heart Of AI R&DApr 15, 2025 am 11:26 AM

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)