Multi-grid redundant bounding box annotation for accurate object detection-AI-php.cn

Home

Technology peripherals

Multi-grid redundant bounding box annotation for accurate object detection

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 01, 2024 pm 09:46 PM

Target Detectiongrid

1. Introduction

Currently leading object detectors are two-stage or single-stage networks based on repurposed backbone classifier networks of deep CNNs. YOLOv3 is one such well-known state-of-the-art single-stage detector that receives an input image and divides it into an equal-sized grid matrix. Grid cells with target centers are responsible for detecting specific targets.

What I shared today is to propose a new mathematical method, which provides a solution for each goal Allocate multiple grids for accurate tight-fit bounding box predictions. The researchers also proposed an effective offline copy-paste data enhancement for target detection. The newly proposed method significantly outperforms some current state-of-the-art object detectors and promises better performance.

2. Background

Object detection networks aim to locate objects on images and accurately label them using precision matching bounding boxes. Recently, there have been two different ways to achieve this. The first method is in terms of performance. The most important method is two-stage object detection. The best representative is regional convolutional neural network (RCNN) and its derivatives [Faster R-CNN: Towards real-time object detection with region proposal] networks], [Fast R-CNN]. In contrast, the second group of object detection implementations are known for their excellent detection speed and lightweight, and are called single-stage networks. A representative example is [You only look once: Unified, real-time object detection], [SSD: Single shot multibox detector], [Focal loss for dense object detection]. The two-stage network relies on a latent region proposal network that generates candidate regions of the image that may contain objects of interest. The candidate regions generated by this network can contain the object's region of interest. In single-stage object detection, detection is handled simultaneously with classification and localization in a complete forward pass. Therefore, single-stage networks are typically lighter, faster, and easier to implement.

Multi-grid redundant bounding box annotation for accurate object detection

Today’s research still adheres to the YOLO method, especially YOLOv3, and proposes a simple hack that can use multiple networks at the same time Cell elements predict target coordinates, categories, and target confidence. The rationale behind multi-network unit elements per object is to increase the likelihood of predicting closely fitting bounding boxes by forcing multiple unit elements to work on the same object.

Multi-grid redundant bounding box annotation for accurate object detection

Some advantages of multi-grid allocation include:

The object detector provides the Multi-view maps of objects, rather than relying solely on one grid cell to predict the object's category and coordinates.

(b) Less random and uncertain bounding box predictions, which means high precision and recall because nearby network units are trained to predict the same object category and coordinates ;

Furthermore, since multi-grid allocation is a mathematical utilization of existing parameters, and no additional keypoint pooling layer and post-processing are required to recombine keypoints to their corresponding For targets such as CenterNet and CornerNet, it can be said that it is a more natural way to achieve what anchor-free or keypoint-based object detectors are trying to achieve. In addition to multi-grid redundant annotations, the researchers also introduced a new offline copy-paste based data enhancement technology for accurate object detection.

3. MULTI-GRID ASSIGNMENT

Multi-grid redundant bounding box annotation for accurate object detection

The above picture contains three targets, namely dogs and bicycles and cars. For the sake of brevity, we will explain our multi-grid assignment on one object. The image above shows the bounding boxes of three objects, with more detail on the dog's bounding box. The image below shows a zoomed-out area of the image above, focusing on the center of the dog's bounding box. The top-left coordinate of the grid cell containing the center of the dog's bounding box is labeled with the number 0, while the other eight grid cells surrounding the grid containing the center have labels from 1 to 8.

Multi-grid redundant bounding box annotation for accurate object detection

So far I have explained the basic facts of how a grid containing the center of an object's bounding box annotates an object. This reliance on only one grid cell per object to do the difficult job of predicting categories and precise tight-fit bounding boxes raises many questions, such as:

(a) Huge imbalance between positive and negative grids, i.e. grid coordinates with and without object center

(b) Slow bounding box convergence to GT

So a natural question to ask here is, "Obviously, most objects contain areas of more than one grid cell, so is there a simple mathematical way to allocate more of these grid cell to try to predict the object's category and coordinates along with the center grid cell?" Some advantages of this are (a) reduced imbalance, (b) faster training to converge to bounding boxes since now multiple grid cells target the same object simultaneously, (c) increased prediction of tight-fit bounding boxes Opportunity (d) provides grid-based detectors such as YOLOv3 with multi-view views instead of single-point views of objects. The newly proposed multigrid allocation attempts to answer the above questions.

Multi-grid redundant bounding box annotation for accurate object detection

Ground-truth encoding

##4. Training

A. The Detection Network: MultiGridDet

MultiGridDet is an object detection network made lighter by removing six darknet convolution blocks from YOLOv3 , faster. A convolution block has a Conv2D Batch Normalization LeakyRelu. The removed blocks are not from the classification backbone, i.e. Darknet53. Instead, remove them from three multiscale detection output networks or heads, two from each output network. Although deep networks generally perform well, networks that are too deep also tend to quickly overfit or significantly slow down the network.

B. The Loss function

Multi-grid redundant bounding box annotation for accurate object detection

##Coordinate activation function plot with different β values

C. Data Augmentation

Offline copy-paste manual training image synthesis works as follows: First, use a simple image search Script that downloads thousands of background object-free images from Google Images using keywords such as landmark, rain, forest, etc., i.e. images without the object of our interest. We then iteratively select p objects and their bounding boxes from random q images of the entire training dataset. We then generate all possible combinations of p bounding boxes selected using their indices as IDs. From the combined set, we select a subset of bounding boxes that satisfy the following two conditions:

##and should efficiently utilize the background image space in its entirety or at least most part of it without the objects overlap.
##5. Experiment and visualization

Performance comparison on Pascal VOC 2007

Multi-grid redundant bounding box annotation for accurate object detection

##Performance comparison on coco data set

Multi-grid redundant bounding box annotation for accurate object detection

Multi-grid redundant bounding box annotation for accurate object detection ##As can be seen from the figure, The first row shows the six input images, while the second row shows the network’s predictions before non-maximal suppression (NMS), and the last row shows MultiGridDet’s final bounding box predictions for the input images after NMS.

Multi-grid redundant bounding box annotation for accurate object detection

The above is the detailed content of Multi-grid redundant bounding box annotation for accurate object detection. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

The AI Skills Gap Is Slowing Down Supply ChainsApr 26, 2025 am 11:13 AM

The term "AI-ready workforce" is frequently used, but what does it truly mean in the supply chain industry? According to Abe Eshkenazi, CEO of the Association for Supply Chain Management (ASCM), it signifies professionals capable of critic

How One Company Is Quietly Working To Transform AI ForeverApr 26, 2025 am 11:12 AM

The decentralized AI revolution is quietly gaining momentum. This Friday in Austin, Texas, the Bittensor Endgame Summit marks a pivotal moment, transitioning decentralized AI (DeAI) from theory to practical application. Unlike the glitzy commercial

Nvidia Releases NeMo Microservices To Streamline AI Agent DevelopmentApr 26, 2025 am 11:11 AM

Enterprise AI faces data integration challenges The application of enterprise AI faces a major challenge: building systems that can maintain accuracy and practicality by continuously learning business data. NeMo microservices solve this problem by creating what Nvidia describes as "data flywheel", allowing AI systems to remain relevant through continuous exposure to enterprise information and user interaction. This newly launched toolkit contains five key microservices: NeMo Customizer handles fine-tuning of large language models with higher training throughput. NeMo Evaluator provides simplified evaluation of AI models for custom benchmarks. NeMo Guardrails implements security controls to maintain compliance and appropriateness

AI Paints A New Picture For The Future Of Art And DesignApr 26, 2025 am 11:10 AM

AI: The Future of Art and Design Artificial intelligence (AI) is changing the field of art and design in unprecedented ways, and its impact is no longer limited to amateurs, but more profoundly affecting professionals. Artwork and design schemes generated by AI are rapidly replacing traditional material images and designers in many transactional design activities such as advertising, social media image generation and web design. However, professional artists and designers also find the practical value of AI. They use AI as an auxiliary tool to explore new aesthetic possibilities, blend different styles, and create novel visual effects. AI helps artists and designers automate repetitive tasks, propose different design elements and provide creative input. AI supports style transfer, which is to apply a style of image

How Zoom Is Revolutionizing Work With Agentic AI: From Meetings To MilestonesApr 26, 2025 am 11:09 AM

Zoom, initially known for its video conferencing platform, is leading a workplace revolution with its innovative use of agentic AI. A recent conversation with Zoom's CTO, XD Huang, revealed the company's ambitious vision. Defining Agentic AI Huang d

The Existential Threat To UniversitiesApr 26, 2025 am 11:08 AM

Will AI revolutionize education? This question is prompting serious reflection among educators and stakeholders. The integration of AI into education presents both opportunities and challenges. As Matthew Lynch of The Tech Edvocate notes, universit

The Prototype: American Scientists Are Looking For Jobs AbroadApr 26, 2025 am 11:07 AM

The development of scientific research and technology in the United States may face challenges, perhaps due to budget cuts. According to Nature, the number of American scientists applying for overseas jobs increased by 32% from January to March 2025 compared with the same period in 2024. A previous poll showed that 75% of the researchers surveyed were considering searching for jobs in Europe and Canada. Hundreds of NIH and NSF grants have been terminated in the past few months, with NIH’s new grants down by about $2.3 billion this year, a drop of nearly one-third. The leaked budget proposal shows that the Trump administration is considering sharply cutting budgets for scientific institutions, with a possible reduction of up to 50%. The turmoil in the field of basic research has also affected one of the major advantages of the United States: attracting overseas talents. 35

All About Open AI's Latest GPT 4.1 Family - Analytics VidhyaApr 26, 2025 am 10:19 AM

OpenAI unveils the powerful GPT-4.1 series: a family of three advanced language models designed for real-world applications. This significant leap forward offers faster response times, enhanced comprehension, and drastically reduced costs compared t

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

Hot Tools

Dreamweaver CS6

Visual web development tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),