Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)-AI-php.cn

Home

Technology peripherals

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)

PHPz

Apr 24, 2024 pm 06:07 PM

radar3d inspection

Written in front&The author’s personal understanding

This article introduces a camera-millimeter wave radar fusion method (CR3DT) for 3D target detection and multi-target tracking ). The lidar-based method has set a high standard for this field, but its high computing power and high cost have restricted the development of this solution in the field of autonomous driving; camera-based 3D target detection and tracking solutions are due to their high cost It is relatively low and has attracted the attention of many scholars, but due to its poor results. Therefore, the fusion of cameras and millimeter wave radar is becoming a promising solution. Under the existing camera framework BEVDet, the author fuses the spatial and velocity information of millimeter wave radar and combines it with the CC-3DT tracking head to significantly improve the accuracy of 3D target detection and tracking and neutralize the contradiction between performance and cost.

Main contribution

Sensor fusion architecture The CR3DT proposed uses intermediate fusion technology before and after the BEV encoder. Integrate millimeter wave radar data; for tracking, a quasi-dense appearance embedding head is used, using millimeter wave radar speed estimation for target association.

Detection Performance Evaluation CR3DT achieved 35.1% mAP and 45.6% nuScenes Detection Score (NDS) on the nuScenes 3D detection validation set. Taking advantage of the rich velocity information contained in radar data, the detector's mean velocity error (mAVE) is reduced by 45.3% compared to SOTA camera detectors.

Tracking performance evaluation CR3DT’s tracking performance on the nuScenes tracking validation set is 38.1% AMOTA, which is an improvement in AMOTA compared to the SOTA tracking model using only cameras 14.9%, the explicit use and further improvements of velocity information in the tracker significantly reduced the number of IDS by about 43%.

Model architecture

This method is based on the EV-Det framework, integrates RADAR’s spatial and velocity information, and combines the CC-3DT tracking head, which is included in its data association An improved millimeter-wave radar is explicitly used to enhance the detector's velocity estimation, ultimately enabling 3D target detection and tracking.

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH) Figure 1 Overall architecture. Detection and tracking are highlighted in light blue and green respectively.

Sensor fusion in BEV space

This module adopts a fusion method similar to PointPillars, including aggregation and connection within it. The BEV grid is set to [-51.2, 51.2] with a resolution of 0.8, resulting in a (128×128) feature grid. Project the image features directly into the BEV space. The number of channels of each grid unit is 64, and then the image BEV features are (64×128×128); similarly, the 18-dimensional information of Radar is aggregated into each In the grid unit, this includes the x, y, and z coordinates of the point, and no enhancement is made to the Radar data. The author confirmed that the Radar point cloud already contains more information than the LiDAR point cloud, so the Radar BEV feature is (18×128×128). Finally, the image BEV features (64×128×128) and Radar BEV features (18×128×128) are directly connected ((64 18)×128×128) as the input of the BEV feature encoding layer. In subsequent ablation experiments, it was found that it is beneficial to add residual connections to the output of the BEV feature encoding layer with a dimension of (256×128×128), resulting in a final input size of the CenterPoint detection head of ((256 18) ×128×128).

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)

Figure 2 Radar point cloud visualization aggregated into BEV space for fusion operation

Tracking module architecture

Tracking is to associate targets in two different frames based on motion correlation and visual feature similarity. During the training process, one-dimensional visual feature embedding vectors are obtained through quasi-dense multivariate positive contrast learning, and then detection and feature embedding are used simultaneously in the tracking stage of CC-3DT. The data association step (DA module in Figure 1) was modified to take advantage of improved CR3DT position detection and velocity estimation. The details are as follows:

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)

Experiments and results

were completed based on the nuScenes data set, and all training did not use CBGS.

Restricted model

Because the author conducted the entire model on a computer with a 3090 graphics card, it is called a restricted model. The target detection part of this model uses BEVDet as the detection baseline, the image encoding backbone is ResNet50, and the image input is set to (3×256×704). Past or future time image information is not used in the model, and the batchsize is set to 8. To alleviate the sparsity of Radar data, five scans are used to enhance the data. No additional temporal information is used in the fusion model.

For target detection, use the scores of mAP, NDS, and mAVE to evaluate; for tracking, use AMOTA, AMOTP, and IDS to evaluate.

Object detection results

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)

Table 1 Detection results on the nuScenes validation set

Table 1 shows the difference between CR3DT and only Detection performance compared to the baseline BEVDet (R50) architecture using cameras. It is obvious that the addition of Radar significantly improves the detection performance. Under the constraints of small resolution and time frame, CR3DT successfully achieves 5.3% mAP and 7.7% NDS improvement compared to camera-only BEVDet. However, due to limitations in computing power, the paper did not achieve experimental results of high resolution, merging time information, etc. In addition, the inference time is also given in the last column of Table 1.

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)

Table 2 Ablation experiment of detection framework

In Table 2, the impact of different fusion architectures on detection indicators is compared. The fusion methods here are divided into two types: the first one is mentioned in the paper, which abandons z-dimensional voxelization and subsequent 3D convolution, and directly aggregates the improved image features and pure RADAR data into columns, thus obtaining The known feature size is ((64 18) × 128 × 128); the other is to voxel the improved image features and pure RADAR data into a cube with a size of 0.8 × 0.8 × 0.8 m, thereby obtaining an alternative feature size is ((64 18) × 10 × 128 × 128), so the BEV compressor module needs to be used in the form of 3D convolution. As can be seen from Table 2(a), an increase in the number of BEV compressors will lead to a decrease in performance, and it can be seen that the first solution performs better. It can also be seen from Table 2(b) that adding the residual block of Radar data can also improve performance, which also confirms what was mentioned in the previous model architecture. Adding residual connections to the output of the BEV feature encoding layer is benefit.

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH) Table 3 Tracking results on the nuScenes validation set based on different configurations of baseline BEVDet and CR3DT

Table 3 shows the tracking of the improved CC3DT tracking model on the nuScenes validation set Results,The performance of the tracker on the baseline and on,the CR3DT detection model is given. The CR3DT model improves the performance of AMOTA by 14.9% over the baseline and decreases it by 0.11 m in AMOTP. Furthermore, it can be seen that IDS is reduced by approximately 43% compared to the baseline.

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)

Table 4 Tracking architecture ablation experiment on CR3DT detection backbone

Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)

Conclusion

This work proposes an efficient camera-radar fusion model - CR3DT, specifically for 3D target detection and multi-target tracking. By integrating Radar data into the camera-only BEVDet architecture and introducing the CC-3DT tracking architecture, CR3DT has greatly improved 3D target detection and tracking accuracy, with mAP and AMOTA increasing by 5.35% and 14.9% respectively.

The camera and millimeter wave radar fusion solution has the advantage of low cost compared to pure LiDAR or LiDAR and camera fusion solution, and is close to the current development of autonomous vehicles. In addition, millimeter-wave radar has the advantage of being robust in bad weather and can face a variety of application scenarios. The current big problem is the sparseness of millimeter-wave radar point clouds and the inability to detect height information. However, with the continuous development of 4D millimeter wave radar, I believe that the future integration of cameras and millimeter wave radar solutions will reach a higher level and achieve even better results!

The above is the detailed content of Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH). For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

undress free porn AI tool websiteMay 13, 2025 am 11:26 AM

https://undressaitool.ai/ is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How to create pornographic images/videos using undressAIMay 13, 2025 am 11:26 AM

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

undress AI official website entrance website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How does undressAI generate pornographic images/videos?May 13, 2025 am 11:26 AM

undressAI porn AI official website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

UndressAI usage tutorial guide articleMay 13, 2025 am 10:43 AM

[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyrightMay 13, 2025 am 01:57 AM

The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

Explaining examples of use and implementation of ChatGPT in local governments! Also introduces banned local governmentsMay 13, 2025 am 01:53 AM

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks agoByDDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.