Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud-AI-php.cn

Home

Technology peripherals

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

PHPz

Oct 27, 2023 pm 05:13 PM

cloud computingIntelligent driving

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

The continuous development of intelligent driving technology is changing our travel methods and transportation systems. As one of the key technologies, 3D reconstruction plays an important role in intelligent driving systems. In addition to the perception and reconstruction algorithms of the car itself, the implementation and development of autonomous driving technology requires the support of huge cloud reconstruction capabilities. The Volcano Engine Multimedia Laboratory uses industry-leading self-developed 3D reconstruction technology, combined with powerful cloud platform resources and capabilities, to help The implementation and application of related technologies in scenarios such as large-scale reconstruction, automatic annotation, and realistic simulation on the cloud.

This article focuses on the principles and practices of the Volcano Engine Multimedia Laboratory's 3D reconstruction technology in dynamic and static scenes, as well as in combination with advanced light field reconstruction technology, to help everyone better understand and understand how intelligent 3D reconstruction on the cloud works. Serving the field of intelligent driving and assisting the development of the industry.

1. Technical challenges and difficulties

Driving scene reconstruction requires point cloud-level three-dimensional reconstruction of the road environment. Compared with traditional three-dimensional reconstruction technology application scenarios, driving scene reconstruction technology has the following Difficulty:

The environmental factors during vehicle operation are complex and uncontrollable. Different weather, lighting, vehicle speed, road conditions, etc. will affect the data collected by the vehicle sensors, which will affect the reconstruction technology. Stickiness brings challenges.
Feature degradation and texture loss often occur in road scenes. For example, the camera obtains image information without rich visual features, or the lidar obtains scene structure information with high similarity. At the same time, the road surface is used as One of the key elements in reconstruction is the single color and lack of sufficient texture information, which puts higher requirements on reconstruction technology.
There are a large number of vehicle-mounted sensors. Common ones include cameras, lidar, millimeter wave radar, inertial navigation, GPS positioning system, wheel speedometer, etc. How to fuse data from multiple sensors to obtain more accurate reconstruction? As a result, reconstruction techniques are challenged.
The existence of dynamic objects such as moving vehicles, non-motorized vehicles, and pedestrians on the road will bring challenges to traditional reconstruction algorithms. How to eliminate dynamic objects will cause interference to static scene reconstruction, and at the same time, the position and size of dynamic objects will be affected. , Estimating the speed is also one of the difficulties of the project.

2. Introduction to Driving Scene Reconstruction Technology

The reconstruction algorithm in the field of autonomous driving usually adopts a technical route based on lidar and cameras, supplemented by GPS and inertial navigation. LiDAR can directly obtain high-precision ranging information and quickly obtain scene structure. Through pre-lidar-camera joint calibration, the image obtained by the camera can give color, semantics and other information to the laser point cloud. At the same time, GPS and inertial navigation can assist in positioning and reduce drift caused by feature degradation during the reconstruction process. However, due to the high price of multi-line lidar, it is usually used in engineering vehicles and is difficult to be used on a large scale in mass-produced vehicles.

In this regard, the Volcano Engine Multimedia Laboratory has independently developed a set of purely visual driving scene reconstruction technologies, including static scene reconstruction, dynamic object reconstruction and neural radiation field reconstruction technology, which can distinguish dynamic and static objects in the scene. , restore the dense point cloud of the static scene, and highlight key elements such as road surfaces, signs, and traffic lights; it can effectively estimate the position, size, orientation, and speed of moving objects in the scene for subsequent 4D annotation; it can On the basis of static scene reconstruction, the neural radiation field is used to reconstruct and reproduce the scene to achieve free perspective roaming, which can be used for scene editing and simulation rendering. This technical solution does not rely on lidar and can achieve decimeter-level relative errors, achieving reconstruction effects close to lidar with minimal hardware cost.

2.1 Static scene reconstruction technology: eliminate dynamic interference and restore static scenes

Visual reconstruction technology is based on multi-view geometry and requires the scene or object to be reconstructed to have inter-frame consistency , that is, they are in a static state in different image frames, so dynamic objects need to be eliminated during the reconstruction process. According to the importance of different elements in the scene, irrelevant point clouds need to be removed from the dense point cloud, while some key element point clouds are retained. Therefore, the image needs to be semantically segmented in advance. In this regard, Volcano Engine The multimedia laboratory combines AI technology and the basic principles of multi-view geometry to build an advanced A robust, accurate and complete visual reconstruction algorithm framework. The reconstruction process includes three key steps : image preprocessing, sparse reconstruction and dense reconstruction .

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

The vehicle-mounted camera is in motion during shooting. Due to the exposure time, serious motion blur will appear in the collected images as the vehicle speed increases. In addition, in order to save bandwidth and storage space, the image will be irreversibly lossy compressed during the transmission process, causing further degradation of image quality. To this end, the Volcano Engine Multimedia Laboratory uses an end-to-end neural network to deblur the image, which can improve image quality while suppressing motion blur. The comparison before and after deblurring is shown in the figure below.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

Before deblurring (left) After deblurring (right)

In order to distinguish dynamic objects, the Volcano Engine Multimedia Laboratory Dynamic object recognition technology based on optical flow is used to obtain pixel-level dynamic object masks. In the subsequent static scene reconstruction process, feature points falling on the dynamic object area will be eliminated, and only static scenes and objects will be retained.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

Optical flow (left) Moving object (right)

During the sparse reconstruction process, the camera's position, orientation and For scene point clouds, the SLAM algorithm (Simultaneous localization and mapping) and the SFM algorithm (Structure from Motion, referred to as SfM) are commonly used. The SFM algorithm can achieve higher reconstruction accuracy without requiring real-time performance. However, the traditional SFM algorithm usually treats each camera as an independent camera, while multiple cameras are usually arranged in different directions on the vehicle, and the relative positions between these cameras are actually fixed (ignoring the vehicle). subtle changes caused by vibration). If the relative position constraints between cameras are ignored, the calculated pose error of each camera will be relatively large. In addition, when occlusion is severe, the pose of individual cameras will be difficult to calculate. In this regard, the Volcano Engine Multimedia Laboratory self-developed an SFM algorithm based on the entire camera group, which can use the prior relative pose constraints between cameras to calculate the pose of the camera group as a whole, and also uses GPS plus inertial navigation. Fusion of positioning results to constrain the center position of the camera group can effectively improve the success rate and accuracy of pose estimation, improve point cloud inconsistencies between different cameras, and reduce point cloud layering.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

Traditional SFM (left) Camera group SFM (right)

Due to the color of the ground Single and missing texture, it is difficult for traditional visual reconstruction to restore the complete ground. However, there are key elements on the ground such as lane lines, arrows, text/logos, etc. Therefore, the Volcano Engine Multimedia Laboratory uses quadratic surfaces to fit the ground to assist Perform depth estimation and point cloud fusion of ground areas. Compared with plane fitting, quadratic surface is more suitable for actual road scenes, because the actual road surface is often not an ideal plane. The following is a comparison of the effects of using plane equations and quadratic surface equations to fit the ground.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

Plane equation (left) Quadratic surface equation (right)

Treat the laser point cloud as a true value, and The visual reconstruction results are superimposed with this to intuitively measure the accuracy of the reconstructed point cloud. As can be seen from the figure below, the fit between the reconstructed point cloud and the true point cloud is very high. After measurement, the relative error of the reconstruction result is about 15cm.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

Volcano Engine Multimedia Laboratory reconstruction results (color) and ground truth point cloud (white)

The following is Volcano Engine Multimedia Comparison of the effects of the laboratory visual reconstruction algorithm and a mainstream commercial reconstruction software. It can be seen that compared with commercial software, the self-developed algorithm of the Volcano Engine Multimedia Laboratory has a better and more complete reconstruction effect. The street signs, traffic lights, telephone poles, as well as lane lines and arrows on the road in the scene have a very high degree of restoration. However, the reconstructed point cloud of commercial software is very sparse, and the road surface is missing in large areas.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

A certain mainstream commercial software (left) Volcano Engine Multimedia Laboratory algorithm (right)

2.2 Dynamic reconstruction technology:

It is very difficult to 3D annotate objects on images. Point clouds are needed. When the vehicle only has a visual sensor, it can obtain the target object in the scene. Complete point clouds are difficult. Especially for dynamic objects, dense point clouds cannot be obtained using traditional 3D reconstruction techniques. In order to provide the expression of moving objects and serve 4D annotation, 3D bounding box (hereinafter referred to as 3D Bbox) is used to represent dynamic objects, and the 3D Bbox posture, size, and speed of dynamic objects in the scene at each moment are obtained through self-developed dynamic reconstruction algorithms. etc., thereby complementing the dynamic object reconstruction capability.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

Dynamic reconstruction pipeline

For each frame of image collected by the vehicle, first extract the dynamic target in the scene and generate a 3d bbox The initial proposal provides two methods: using 2D target detection and estimating the corresponding 3D bbox through the camera pose; directly using 3D target detection. The two methods can be flexibly selected for different data. 2D detection has good generalization, and 3D detection can obtain better initial values. At the same time, feature points inside the dynamic area of the image are extracted. After obtaining the initial 3D bbox proposal and feature points of a single frame image, establish data correlation between multiple frames: establish object matching through a self-developed multi-target tracking algorithm, and match image features through feature matching technology. After obtaining the matching relationship, the image frames with common view relationships are created as local maps, and an optimization problem is constructed to solve the globally consistent target bbox estimation. Specifically, through feature point matching and dynamic triangulation technology, dynamic 3D points are restored; vehicle motion is modeled, and observations between objects, 3D points, and cameras are jointly optimized to obtain the optimal estimated dynamic object 3D bbox.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

2d generates 3d (second from left) 3d target detection example

2.3 NeRF Reconstruction: photorealistic rendering, free perspective

Use neural network for implicit reconstruction, use differentiable rendering model, from existing views Learn how to render images from a new perspective for photorealistic image rendering: Neural Radiation Field (NeRF) technology. At the same time, implicit reconstruction has the characteristics of being editable and querying continuous space, and can be used for tasks such as automatic annotation and simulation data construction in autonomous driving scenarios. Scene reconstruction using NeRF technology is extremely valuable.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

The Volcano Engine Multimedia Laboratory combines neural radiation field technology and large scene modeling technology. In specific practice, the data is first processed. Dynamic objects in the scene will cause artifacts in the NeRF reconstruction. With the help of self-developed dynamic and static segmentation, shadow detection and other algorithms, the areas in the scene that are inconsistent with the geometry are extracted and a mask is generated. At the same time, the video inpainting algorithm is used to repair the removed areas. With the help of self-developed 3D reconstruction capabilities, high-precision geometric reconstruction of the scene is performed, including camera parameter estimation and sparse and dense point cloud generation. In addition, the scenario is split to reduce single training resource consumption, and distributed training and maintenance can be performed. During the neural radiation field training process, for large outdoor borderless scenes, the team used some optimization strategies to improve the new perspective generation effect in this scene, such as improving the reconstruction accuracy by simultaneously optimizing poses during training, and based on the level of hash coding. Expression improves model training speed, appearance coding is used to improve the appearance consistency of scenes collected at different times, and mvs dense depth information is used to improve geometric accuracy. The team cooperated with HaoMo Zhixing to complete single-channel acquisition and multi-channel merged NeRF reconstruction. The relevant results were released on Haomo AI Day.

Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud

Dynamic object/shadow culling, filling

The above is the detailed content of Excellent Practice of 3D Reconstruction of Intelligent Driving on the Cloud. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Personal Hacking Will Be A Pretty Fierce BearMay 11, 2025 am 11:09 AM

Cyberattacks are evolving. Gone are the days of generic phishing emails. The future of cybercrime is hyper-personalized, leveraging readily available online data and AI to craft highly targeted attacks. Imagine a scammer who knows your job, your f

Pope Leo XIV Reveals How AI Influenced His Name ChoiceMay 11, 2025 am 11:07 AM

In his inaugural address to the College of Cardinals, Chicago-born Robert Francis Prevost, the newly elected Pope Leo XIV, discussed the influence of his namesake, Pope Leo XIII, whose papacy (1878-1903) coincided with the dawn of the automobile and

FastAPI-MCP Tutorial for Beginners and Experts - Analytics VidhyaMay 11, 2025 am 10:56 AM

This tutorial demonstrates how to integrate your Large Language Model (LLM) with external tools using the Model Context Protocol (MCP) and FastAPI. We'll build a simple web application using FastAPI and convert it into an MCP server, enabling your L

Dia-1.6B TTS : Best Text-to-Dialogue Generation Model - Analytics VidhyaMay 11, 2025 am 10:27 AM

Explore Dia-1.6B: A groundbreaking text-to-speech model developed by two undergraduates with zero funding! This 1.6 billion parameter model generates remarkably realistic speech, including nonverbal cues like laughter and sneezes. This article guide

3 Ways AI Can Make Mentorship More Meaningful Than EverMay 10, 2025 am 11:17 AM

I wholeheartedly agree. My success is inextricably linked to the guidance of my mentors. Their insights, particularly regarding business management, formed the bedrock of my beliefs and practices. This experience underscores my commitment to mentor

AI Unearths New Potential In The Mining IndustryMay 10, 2025 am 11:16 AM

AI Enhanced Mining Equipment The mining operation environment is harsh and dangerous. Artificial intelligence systems help improve overall efficiency and security by removing humans from the most dangerous environments and enhancing human capabilities. Artificial intelligence is increasingly used to power autonomous trucks, drills and loaders used in mining operations. These AI-powered vehicles can operate accurately in hazardous environments, thereby increasing safety and productivity. Some companies have developed autonomous mining vehicles for large-scale mining operations. Equipment operating in challenging environments requires ongoing maintenance. However, maintenance can keep critical devices offline and consume resources. More precise maintenance means increased uptime for expensive and necessary equipment and significant cost savings. AI-driven

Why AI Agents Will Trigger The Biggest Workplace Revolution In 25 YearsMay 10, 2025 am 11:15 AM

Marc Benioff, Salesforce CEO, predicts a monumental workplace revolution driven by AI agents, a transformation already underway within Salesforce and its client base. He envisions a shift from traditional markets to a vastly larger market focused on

AI HR Is Going To Rock Our Worlds As AI Adoption SoarsMay 10, 2025 am 11:14 AM

The Rise of AI in HR: Navigating a Workforce with Robot Colleagues The integration of AI into human resources (HR) is no longer a futuristic concept; it's rapidly becoming the new reality. This shift impacts both HR professionals and employees, dem

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

Chinese version, very easy to use

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

Hot Topics

1665

1424

1321

1269

1249