Technology peripherals

'AI Perspective Eye', three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

'AI Perspective Eye', three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Mar 08, 2024 pm 03:46 PM

aitrain

Occlusion is one of the most basic but still unsolved problems in computer vision, because occlusion means the lack of visual information, but the machine vision system relies on visual information for perception and understanding, and in reality In the world, mutual occlusion between objects is everywhere. The latest work of Andrew Zisserman's team at the VGG Laboratory at the University of Oxford systematically solved the problem of occlusion completion of arbitrary objects and proposed a new and more accurate evaluation data set for this problem. This work was praised by MPI boss Michael Black, the official account of CVPR, the official account of the Department of Computer Science of the University of Southern California, etc. on the X platform. The following is the main content of the paper "Amodal Ground Truth and Completion in the Wild".

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

Paper link: https://arxiv.org/pdf/2312.17247.pdf
Project homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/
Code address: https://github.com/Championchess/Amodal-Completion-in-the-Wild

Amodal Segmentation is designed to complete objects that are occluded Part, that is, a shape mask that gives the visible and invisible parts of the object. This task can benefit many downstream tasks: object recognition, target detection, instance segmentation, image editing, 3D reconstruction, video object segmentation, support relationship reasoning between objects, robot manipulation and navigation, because in these tasks it is known that the occluded object is intact The shape will help.

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

However, how to evaluate the performance of a model for non-modal segmentation in the real world is a difficult problem: although there are a large number of Occluded objects, but how to get the reference standard or non-modal mask of the complete shape of these objects? Previous work has involved manual annotation of non-modal masks, but the reference standards for such annotation are difficult to avoid introducing human errors; there are also works by creating synthetic data sets, such as directly attaching another object to a complete object. Obtain the complete shape of the occluded object, but the pictures obtained in this way are not real picture scenes. Therefore, this work proposes a method through 3D model projection to construct a large-scale real image dataset (MP3D-Amodal) covering multiple object categories and providing amodal masks to accurately evaluate the performance of amodal segmentation. The comparison of different data sets is as follows:

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

Specifically, taking the MatterPort3D data set as an example, for any real photos and scenes For a three-dimensional structured data set, we can simultaneously project the three-dimensional shapes of all objects in the scene onto the camera to obtain the modal mask of each object (visible shape, because objects are occluding each other), and then project each object in the scene The three-dimensional shape of the object is projected to the camera respectively to obtain the non-modal mask of the object, that is, the complete shape. By comparing the modal mask and the non-modal mask, occluded objects can be picked out.

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

The statistics of the data set are as follows:

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

A sample of the data set is as follows:

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

#In addition, in order to solve the complete shape reconstruction task of any object, the author extracted Extract the prior knowledge about the complete shape of the object from the features of the Stable Diffusion model to perform non-modal segmentation of any occluded object. The specific architecture is as follows (SDAmodal):

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

The motivation for using Stable Diffusion Feature is that Stable Diffusion has the ability to complete pictures, so it may contain all the information about the object to a certain extent; and because Stable Diffusion After training with a large number of pictures, we can expect its features to have the ability to process any object in any environment. Different from previous two-stage frameworks, SDAmodal does not require marked occlusion masks as input; SDAmodal has a simple structure, but shows strong zero-sample generalization ability (compare Settings F and H in the following table, only in training on COCOA can improve on another data set in a different domain and different categories); even if there is no annotation of occluded objects, SDAmodal can improve on the existing data set COCOA covering multiple types of occluded objects and the newly proposed On the MP3D-Amodal data set, SOTA performance (Setting H) has been achieved.

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

In addition to quantitative experiments, qualitative comparisons also reflect the advantages of the SDAmodal model: It can be observed from the figure below (all models are only in COCOA training), for different types of occluded objects, whether from COCOA or another MP3D-Amodal, SDAmodal can greatly improve the effect of non-modal segmentation, and the predicted non-modal mask is closer to reality of.

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

For more details, please read the original paper.

The above is the detailed content of 'AI Perspective Eye', three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Related Article

5 Statistical Tests Every Data Scientist Should Know - Analytics Vidhya

5 Statistical Tests Every Data Scientist Should Know - Analytics VidhyaApr 19, 2025 am 10:27 AM

Data Science's Essential Statistical Tests: A Comprehensive Guide Unlocking valuable insights from data is paramount in data science. Mastering statistical tests is fundamental to achieving this. These tests empower data scientists to rigorously val

How to Perform Computer Vision Tasks with Florence-2 - Analytics Vidhya

How to Perform Computer Vision Tasks with Florence-2 - Analytics VidhyaApr 19, 2025 am 10:21 AM

Introduction The introduction of the original transformers paved the way for the current Large Language Models. Similarly, after the introduction of the transformer model, the vision transformer (ViT) was introduced. Like the

7 Ways to Split Data Using LangChain Text Splitters - Analytics Vidhya

7 Ways to Split Data Using LangChain Text Splitters - Analytics VidhyaApr 19, 2025 am 10:11 AM

LangChain Text Splitters: Optimizing LLM Input for Efficiency and Accuracy Our previous article covered LangChain's document loaders. However, LLMs have context window size limitations (measured in tokens). Exceeding this limit truncates data, comp

Free Generative AI Course: Pioneering the Future of Innovation

Free Generative AI Course: Pioneering the Future of InnovationApr 19, 2025 am 10:01 AM

Generative AI: Revolutionizing Creativity and Innovation Generative AI is transforming industries by creating text, images, music, and virtual worlds at the touch of a button. Its impact spans video editing, music production, art, entertainment, hea

Creating a QA Model with Universal Sentence Encoder and WikiQA

Creating a QA Model with Universal Sentence Encoder and WikiQAApr 19, 2025 am 10:00 AM

Harnessing the Power of Embedding Models for Advanced Question Answering In today's information-rich world, the ability to obtain precise answers instantly is paramount. This article demonstrates building a robust question-answering (QA) model using

Top 10 Must Read Machine Learning Research Papers

Top 10 Must Read Machine Learning Research PapersApr 19, 2025 am 09:53 AM

This article explores ten seminal publications that have revolutionized artificial intelligence (AI) and machine learning (ML). We'll examine recent breakthroughs in neural networks and algorithms, explaining the core concepts driving modern AI. Th

Top 11 AI Tools to Replace SEO Agencies - Analytics Vidhya

Top 11 AI Tools to Replace SEO Agencies - Analytics VidhyaApr 19, 2025 am 09:49 AM

AI's Rise in SEO: Top 11 Tools to Outperform SEO Agencies The rapid advancement of AI has profoundly reshaped the SEO landscape. Businesses aiming for top search engine rankings are leveraging AI's power to optimize their online strategies. From au

Top 10 Free AI Playgrounds For You to Try in 2025 - Analytics Vidhya

Top 10 Free AI Playgrounds For You to Try in 2025 - Analytics VidhyaApr 19, 2025 am 09:45 AM

Exploring the Best Free AI Playgrounds in 2024: A Comprehensive Guide Access to the right tools and platforms is key to learning and innovating in the ever-evolving field of artificial intelligence (AI). AI playgrounds offer a fantastic opportunity

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Saving in R.E.P.O. Explained (And Save Files)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks agoByDDD

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

Hot Topics

Where is the login entrance for gmail email?

7565

15

CakePHP Tutorial

1386

52

What is the format of the account name of steam

87

11

win11 activation key permanent

61

19

nyt connections hints and answers

28

102