search
HomeTechnology peripheralsAICVPR 2024 | Byte proposes a new generation of data set COCONut, which is denser than COCO granular segmentation

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com.

With the development of artificial intelligence, language models and generative models have achieved a lot of success and in the process of designing the model, the number of parameters of the model It’s also getting bigger. For fine-grained understanding tasks, the number of model parameters is also increasing. However, there is a contradiction between scale and accuracy in existing data sets. For example, 99.1% of the masks in the SA-1B data set are machine-generated, but there are no semantic labels. Some other public data sets also have accuracy problems, and these The size of the data set is generally relatively small.

Recently, ByteDance has proposed a new generation of fine-grained understanding data sets. In response to the design needs of contemporary deep learning models, a total of 383K images have been panoramic The manual annotation of segmentation finally reached 5.18M masks, which is the largest panoramic segmentation understanding data set with manual labels so far, named COCONut. This result has been selected for CVPR2024.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

  • Paper link: https://arxiv.org/abs/2404.08639
  • Code and data Set link: https://xdeng7.github.io/coconut.github.io/

The video shows the mask of a single image of COCONut From the statistics of density and semantic categories, it can be seen that the semantics of the data set are rich and the mask segmentation granularity is fine. This dataset also supports a variety of understanding tasks, such as panoramic segmentation, instance segmentation, semantic segmentation, object detection, semantically controlled generation, and open vocabulary segmentation. On multiple tasks, significant performance improvements are achieved just by replacing the dataset.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

Annotation method

Usually only using manual annotation is very expensive, this is also An important reason why most existing public data sets cannot grow in size. There are also some data sets that directly use labels generated by the model, but often such generated labels will not greatly improve the training of the model. This article also verifies this. Therefore, this paper proposes a novel annotation method, combined with manual semi-automatic label generation. It can not only ensure the accuracy of data annotation, but also save the cost of manual labor, while also accelerating the annotation process.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

Comparison of labeling accuracy

The researcher put COCONut and COCO on the same picture annotations for comparison. From the comparison in the figure below, we can see that the annotation method proposed in this article achieves almost the same accuracy as purely manual annotation using Photoshop, but the annotation speed is increased by more than 10 times.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

COCONut Dataset Details

and Compared with the existing COCO data set, the distribution of each category in the data set is relatively similar, but the total number of masks in each picture exceeds the COCO data set, especially when there are a large number of single pictures with more than 100 masks. This shows that COCONut's annotation is more refined and its granular segmentation is more intensive.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

Experimental verification

In addition to proposing a better training set, the researchers also found that the existing verification set cannot reflect the model well performance improvement, so this article also proposes a more challenging test set that can reflect the improvement of the model, named COCONut-val. As can be seen from the table below, by only replacing the data set, a higher-precision training set can It brings great improvements to the model, such as reaching a PQ of more than 4 points in panoramic segmentation. However, when the size of the training set increases, it can be found that testing with the existing test set does not reflect the improvement of the model, while COCONut-val can reflect that the model still has obvious improvements after increasing the amount of training set data. promote.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

The following figure shows a comparison of the semantic categories and mask density of the verification set. It can be seen that the newly proposed verification set is more challenging and can better reflect the improvement of the model.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

For more experimental results, please refer to the original paper. The team will provide the data set and corresponding model for public download on the GitHub homepage.

Bytedance Intelligent Creation Team

##Intelligent Creation The team is Bytedance's AI & multimedia technology team, covering computer vision, audio and video editing, special effects processing and other technical fields. With the help of the company's rich business scenarios, infrastructure resources and technical collaboration atmosphere, it has realized cutting-edge algorithms - engineering systems - products The full-link closed loop aims to provide the company's internal businesses with cutting-edge content understanding, content creation, interactive experience and consumption capabilities and industry solutions in various forms.

Currently, the intelligent creation team has opened its technical capabilities and services to enterprises through Volcano Engine, a cloud service platform owned by ByteDance. More positions related to large model algorithms are opening.

The above is the detailed content of CVPR 2024 | Byte proposes a new generation of data set COCONut, which is denser than COCO granular segmentation. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete
Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserTesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserApr 22, 2025 am 11:48 AM

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailSam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailApr 22, 2025 am 11:29 AM

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's AI Omniverse Expands At GTC 2025Nvidia's AI Omniverse Expands At GTC 2025Apr 22, 2025 am 11:28 AM

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Exploring the Capabilities of Google's Gemma 2 ModelsExploring the Capabilities of Google's Gemma 2 ModelsApr 22, 2025 am 11:26 AM

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaThe Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaApr 22, 2025 am 11:21 AM

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

AI For Runners And Athletes: We're Making Excellent ProgressAI For Runners And Athletes: We're Making Excellent ProgressApr 22, 2025 am 11:12 AM

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Jamie Engstrom On Technology, Talent And Transformation At CaterpillarJamie Engstrom On Technology, Talent And Transformation At CaterpillarApr 22, 2025 am 11:10 AM

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

New Google Photos Update Makes Any Photo Pop With Ultra HDR QualityNew Google Photos Update Makes Any Photo Pop With Ultra HDR QualityApr 22, 2025 am 11:09 AM

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.