


Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH)
Written in front&The author’s personal understanding
This article introduces a camera-millimeter wave radar fusion method (CR3DT) for 3D target detection and multi-target tracking ). The lidar-based method has set a high standard for this field, but its high computing power and high cost have restricted the development of this solution in the field of autonomous driving; camera-based 3D target detection and tracking solutions are due to their high cost It is relatively low and has attracted the attention of many scholars, but due to its poor results. Therefore, the fusion of cameras and millimeter wave radar is becoming a promising solution. Under the existing camera framework BEVDet, the author fuses the spatial and velocity information of millimeter wave radar and combines it with the CC-3DT tracking head to significantly improve the accuracy of 3D target detection and tracking and neutralize the contradiction between performance and cost.
Main contribution
Sensor fusion architecture The CR3DT proposed uses intermediate fusion technology before and after the BEV encoder. Integrate millimeter wave radar data; for tracking, a quasi-dense appearance embedding head is used, using millimeter wave radar speed estimation for target association.
Detection Performance Evaluation CR3DT achieved 35.1% mAP and 45.6% nuScenes Detection Score (NDS) on the nuScenes 3D detection validation set. Taking advantage of the rich velocity information contained in radar data, the detector's mean velocity error (mAVE) is reduced by 45.3% compared to SOTA camera detectors.
Tracking performance evaluation CR3DT’s tracking performance on the nuScenes tracking validation set is 38.1% AMOTA, which is an improvement in AMOTA compared to the SOTA tracking model using only cameras 14.9%, the explicit use and further improvements of velocity information in the tracker significantly reduced the number of IDS by about 43%.
Model architecture
This method is based on the EV-Det framework, integrates RADAR’s spatial and velocity information, and combines the CC-3DT tracking head, which is included in its data association An improved millimeter-wave radar is explicitly used to enhance the detector's velocity estimation, ultimately enabling 3D target detection and tracking.
Figure 1 Overall architecture. Detection and tracking are highlighted in light blue and green respectively.
Sensor fusion in BEV space
This module adopts a fusion method similar to PointPillars, including aggregation and connection within it. The BEV grid is set to [-51.2, 51.2] with a resolution of 0.8, resulting in a (128×128) feature grid. Project the image features directly into the BEV space. The number of channels of each grid unit is 64, and then the image BEV features are (64×128×128); similarly, the 18-dimensional information of Radar is aggregated into each In the grid unit, this includes the x, y, and z coordinates of the point, and no enhancement is made to the Radar data. The author confirmed that the Radar point cloud already contains more information than the LiDAR point cloud, so the Radar BEV feature is (18×128×128). Finally, the image BEV features (64×128×128) and Radar BEV features (18×128×128) are directly connected ((64 18)×128×128) as the input of the BEV feature encoding layer. In subsequent ablation experiments, it was found that it is beneficial to add residual connections to the output of the BEV feature encoding layer with a dimension of (256×128×128), resulting in a final input size of the CenterPoint detection head of ((256 18) ×128×128).
Figure 2 Radar point cloud visualization aggregated into BEV space for fusion operation
Tracking module architecture
Tracking is to associate targets in two different frames based on motion correlation and visual feature similarity. During the training process, one-dimensional visual feature embedding vectors are obtained through quasi-dense multivariate positive contrast learning, and then detection and feature embedding are used simultaneously in the tracking stage of CC-3DT. The data association step (DA module in Figure 1) was modified to take advantage of improved CR3DT position detection and velocity estimation. The details are as follows:
Experiments and results
were completed based on the nuScenes data set, and all training did not use CBGS.
Restricted model
Because the author conducted the entire model on a computer with a 3090 graphics card, it is called a restricted model. The target detection part of this model uses BEVDet as the detection baseline, the image encoding backbone is ResNet50, and the image input is set to (3×256×704). Past or future time image information is not used in the model, and the batchsize is set to 8. To alleviate the sparsity of Radar data, five scans are used to enhance the data. No additional temporal information is used in the fusion model.
For target detection, use the scores of mAP, NDS, and mAVE to evaluate; for tracking, use AMOTA, AMOTP, and IDS to evaluate.
Object detection results
Table 1 Detection results on the nuScenes validation set
Table 1 shows the difference between CR3DT and only Detection performance compared to the baseline BEVDet (R50) architecture using cameras. It is obvious that the addition of Radar significantly improves the detection performance. Under the constraints of small resolution and time frame, CR3DT successfully achieves 5.3% mAP and 7.7% NDS improvement compared to camera-only BEVDet. However, due to limitations in computing power, the paper did not achieve experimental results of high resolution, merging time information, etc. In addition, the inference time is also given in the last column of Table 1.
Table 2 Ablation experiment of detection framework
In Table 2, the impact of different fusion architectures on detection indicators is compared. The fusion methods here are divided into two types: the first one is mentioned in the paper, which abandons z-dimensional voxelization and subsequent 3D convolution, and directly aggregates the improved image features and pure RADAR data into columns, thus obtaining The known feature size is ((64 18) × 128 × 128); the other is to voxel the improved image features and pure RADAR data into a cube with a size of 0.8 × 0.8 × 0.8 m, thereby obtaining an alternative feature size is ((64 18) × 10 × 128 × 128), so the BEV compressor module needs to be used in the form of 3D convolution. As can be seen from Table 2(a), an increase in the number of BEV compressors will lead to a decrease in performance, and it can be seen that the first solution performs better. It can also be seen from Table 2(b) that adding the residual block of Radar data can also improve performance, which also confirms what was mentioned in the previous model architecture. Adding residual connections to the output of the BEV feature encoding layer is benefit.
Table 3 Tracking results on the nuScenes validation set based on different configurations of baseline BEVDet and CR3DT
Table 3 shows the tracking of the improved CC3DT tracking model on the nuScenes validation set Results,The performance of the tracker on the baseline and on,the CR3DT detection model is given. The CR3DT model improves the performance of AMOTA by 14.9% over the baseline and decreases it by 0.11 m in AMOTP. Furthermore, it can be seen that IDS is reduced by approximately 43% compared to the baseline.
Table 4 Tracking architecture ablation experiment on CR3DT detection backbone
Conclusion
This work proposes an efficient camera-radar fusion model - CR3DT, specifically for 3D target detection and multi-target tracking. By integrating Radar data into the camera-only BEVDet architecture and introducing the CC-3DT tracking architecture, CR3DT has greatly improved 3D target detection and tracking accuracy, with mAP and AMOTA increasing by 5.35% and 14.9% respectively.
The camera and millimeter wave radar fusion solution has the advantage of low cost compared to pure LiDAR or LiDAR and camera fusion solution, and is close to the current development of autonomous vehicles. In addition, millimeter-wave radar has the advantage of being robust in bad weather and can face a variety of application scenarios. The current big problem is the sparseness of millimeter-wave radar point clouds and the inability to detect height information. However, with the continuous development of 4D millimeter wave radar, I believe that the future integration of cameras and millimeter wave radar solutions will reach a higher level and achieve even better results!
The above is the detailed content of Beyond BEVFormer! CR3DT: RV fusion helps 3D detection & tracking of new SOTA (ETH). For more information, please follow other related articles on the PHP Chinese website!

https://undressaitool.ai/ is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.
![[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyright](https://img.php.cn/upload/article/001/242/473/174707263295098.jpg?x-oss-process=image/resize,p_40)
The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Dreamweaver Mac version
Visual web development tools

Dreamweaver CS6
Visual web development tools

SublimeText3 Chinese version
Chinese version, very easy to use
