search
HomeTechnology peripheralsAIIn 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

Not everyone can understand that Tesla V12 has been widely launched in North America and has gained more and more user recognition due to its excellent performance. End-to-end autonomous driving has also become the most concerned about in the autonomous driving industry. technical direction. Recently, I had the opportunity to have some exchanges with first-class engineers, product managers, investors, and media people in many industries. I found that everyone is very interested in end-to-end autonomous driving, but even in terms of some basic understanding of end-to-end autonomous driving, There are still misunderstandings of this kind. As someone who has been fortunate enough to experience the city function with and without pictures from domestic first-tier brands, as well as the two versions of FSD V11 and V12, here I would like to talk about a few current developments based on my professional background and tracking the progress of Tesla FSD over the years. At this stage, everyone talked about common misunderstandings about end-to-end autonomous driving and gave my own interpretation of these issues.

Doubt 1: Can end-to-end perception and end-to-end decision-making and planning be counted as end-to-end autonomous driving?

All steps from sensor input to planning and then controlling signal output are end-to-end derivable, so that the entire system can be trained as a large model through gradient descent training. Through gradient backpropagation, parameters can be updated and optimized in all aspects of the model from input to output during model training, so that the driving behavior of the entire system can be optimized based on the driving decision trajectory directly perceived by the user. Recently, some friends have claimed that they are end-to-end sensing or end-to-end decision-making when promoting end-to-end autonomous driving. In fact, I think both of these cannot be counted as end-to-end autonomous driving, but can only be regarded as end-to-end autonomous driving. It is called pure data-driven perception and pure data-driven decision planning.

Some may make decisions based on a specific model, combined with a hybrid strategy of traditional methods for security verification and trajectory optimization, also known as end-to-end planning. In addition, some people believe that Tesla V12 is not a purely accurate model output control signal, but a hybrid strategy that combines some rule methods. According to the famous Green on http://X.com, he posted a tweet some time ago saying that the code of the rules can still be found in the V12 technology stack. My understanding of this is that the code discovered by Green is likely to be the V11 version code retained by the V12 high-speed technology stack, because we know that currently V12 only replaces the original urban technology stack with end-to-end, and the high-speed will still use the V11 solution, so Finding some fragments of regular code in the unraveled code does not mean that V12 is false "end-to-end", but it is likely that the code found is high-speed code. In fact, we can see from the AI ​​Day in 2022 that V11 and previous versions are already hybrid solutions. Therefore, if V12 is not a complete model straight off the track, then the solution will not be much different from the previous versions. If it is There is no reasonable explanation for the jump in performance of the V12. For Tesla’s previous plans, please refer to my interpretation of EatElephant on AI Day: Tesla AI Day 2022 -- Interpretation of Shizi: It is called the Autonomous Driving Spring Festival Gala, and the decentralized R&D team is eager to transform into an AI technology company.

In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

Judging from the 2022 AI Day, V11 is already a planning solution mixed with NN Planner

In general, whether it is the perception post-processing code , or the rule candidate trajectory scoring, or even the safety policy. Once the rule code is introduced and there is an if else branch, the stable transmission of the entire system will be truncated, which also results in the loss of the end-to-end system obtained through training. The biggest advantage of global optimization.

Doubt 2: Is end-to-end a reinvention of previous technology?

Another common misunderstanding is that end-to-end is to overthrow the previously accumulated technology and conduct a thorough new technological innovation, and many people think that Tesla has just realized users of the end-to-end autonomous driving system. Push, then other manufacturers no longer need to iterate on the original modular technology stack of perception, prediction, and planning. Everyone directly enters the end-to-end system. Instead, they can learn from the advantages of latecomers to quickly catch up with or even surpass Tesla. It is true that using a large model to complete the mapping from sensor input to planning control signals is the most thorough end-to-end approach. Companies have also tried similar methods for a long time. For example, Nvidia's DAVE-2 and Wayve and other companies have used similar methods. Methods. This thorough end-to-end technology is indeed closer to a black box and is difficult to debug and iteratively optimize. At the same time, since sensor input signals such as images and point clouds are very high-dimensional input spaces, output control signals such as steering wheel angle and throttle control The moving pedal is a relatively low-dimensional output space, making it completely unusable for actual vehicle testing.

In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

A thorough end-to-end system will also use some common auxiliary tasks such as semantic segmentation and depth estimation to help model convergence and debugging

So the FSD V12 we actually saw retains almost all previous visualization content, which shows that FSD V12 is end-to-end trained on the original strong perceptual basis, and the FSD iteration starting in October 2020 does not Being abandoned, it became the solid technical foundation of V12. Andrej Karparthy has answered similar questions before. Although he was not involved in the development of V12, he believes that all previous technology accumulation has not been abandoned, but has just been moved from the front to the behind the scenes. Therefore, end-to-end navigation is gradually realized based on the original technology by removing part of the rule code step by step.

In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

V12 retains almost all the perceptions of FSD, and only cancels limited visual content such as cone barrels

Doubt 3: Academic Paper Can end-to-end be migrated to an actual product?

UniAD becoming the 2023 CVPR Best Paper undoubtedly represents the academic community’s high expectations for end-to-end autonomous driving systems. Since Tesla introduced the innovation of its visual BEV perception technology in 2021, the domestic academic community has invested a lot of enthusiasm in autonomous driving BEV perception, and a series of studies have been born, promoting the performance optimization and implementation deployment of BEV methods, then Can end-to-end follow a similar route, led by academia and followed by industry to promote the rapid iterative implementation of end-to-end technology in products? I think it is relatively difficult. First of all, BEV perception is still a relatively modular technology, more at the algorithm level, and entry-level performance does not require so high data volume. The launch of the high-quality academic open source data set Nuscenes provides a convenient precursor for many BEV research. Conditions, although the BEV sensing solution iterated on Nuscenes cannot meet product-level performance requirements, it is of great reference value as a proof of concept and model selection. However, academia lacks large-scale end-to-end available data. The largest Nuplan data set currently contains 1,200 hours of real vehicle collection data in 4 cities. However, at a financial report meeting in 2023, Musk said that for end-to-end autonomous driving, "1 million video cases have been trained, and it can barely work." ; 2 million, it’s slightly better; 3 million, you’ll feel Wow; when it reaches 10 million, its performance becomes incredible.” Tesla's Autopilot return data is generally considered to be a 1-minute segment, so the entry-level 1 million video case is about 16,000 hours, which is at least an order of magnitude more than the largest academic data set. It should be noted here that nuplan collects data continuously, so in the data There are fatal flaws in the distribution and diversity. The vast majority of the data are simple scenes, which means that using academic data sets like nuplan cannot even get a version that can barely get on the train.

In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

The Nuplan data set is already a very large-scale academic data set, but the exploration of an end-to-end solution may not be enough

So we look at The vast majority of end-to-end autonomous driving solutions, including UniAD, cannot be run in actual vehicles, and they can only resort to open-loop evaluation. The reliability of open-loop evaluation indicators is very low, because open-loop evaluation cannot identify the problem of model confusion and cause-effect, so even if the model only learns to use historical path extrapolation, it can obtain very good open-loop indicators, but such a model is It is completely unusable. In 2023, Baidu once published a Paper called AD-MLP (https://arxiv.org/pdf/2305.10430) to discuss the shortcomings of open-loop planning evaluation indicators. This Paper only used historical information. , without introducing any perception, it has obtained very good open-loop evaluation indicators, even close to some current SOTA work. However, it is obvious that no one can drive a car well with eyes closed!

In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

AD MLP achieves good open-loop indicators by not relying on sensory input to illustrate that using open-loop indicators as a reference has little practical significance

Then closed-loop policy verification Can the problem of open-loop imitation learning be solved? At least for now, the academic community generally relies on the CARLA closed-loop simulation system for end-to-end research and development, but the models obtained by CARLA based on game engines are also difficult to transfer to the real world.

Doubt 4: End-to-end autonomous driving is just an algorithm innovation?

Finally end-to-end is not just a new algorithm. The models of different modules of the modular autonomous driving system can be iteratively trained separately using the data of their respective tasks. However, each function of the end-to-end system is trained at the same time, which requires the training data to be extremely consistent, and each piece of data must be accurate. All subtask labels are labeled. Once a task labeling fails, it will be difficult to use this data in the end-to-end training task. This puts extremely high requirements on the success rate and performance of the automatic labeling Pipeline. Secondly, the end-to-end system requires all modules to reach a high performance level in order to achieve better results in the end-to-end decision planning output task. Therefore, it is generally believed that the data threshold of the end-to-end system is much higher than the data of each individual module. demand, and the threshold of data is not only the absolute quantity requirements, but also the distribution and diversity of data. This means that we do not have complete control over the vehicles and have to adapt to multiple suppliers with customers of different models. You may encounter greater difficulties when developing an end-to-end system. On the threshold of computing power, Musk stated on X.com in early March this year that the biggest limiting factor of FSD is computing power. Recently, Boss Ma said that their computing power problem has been greatly improved. , almost at the same time, at the 2024 Q1 financial report meeting, Tesla revealed that they now have 35,000 H100 computing resources, and revealed that this number will reach 85,000 by the end of 2024. There is no doubt that Tesla has very powerful computing power engineering optimization capabilities, which means that to reach the current level of FSD V12, there is a high probability that 35,000 H100 and billions of dollars in infrastructure capital expenditure are necessary prerequisites. If it is not as efficient as Tesla, then this threshold may be further raised.

In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

In early March, Musk said that the main limiting factor in the iteration of FSD was computing power

In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

In early April, Musk said that this year Tesla’s total investment in computing power will exceed 10 billion US dollars

In addition, a netizen on http://X.com shared a screenshot of Norm Marks, an executive in the Nvidia automotive industry, at a meeting this year. It can be seen from this that by the end of 2023, the number of NV graphics cards owned by Tesla is completely overwhelming on the histogram (the green arrow on the far right of the left picture, the middle text illustrates the number of NV graphics cards owned by this No. 1 OEM) Number of graphics cards > 7,000 DGX nodes. This OEM is obviously Tesla. Each node is calculated based on 8 cards. By the end of 23, Tesla probably has more than 56,000 A100 graphics cards, which is more than four times more than the second-ranked OEM. I don’t understand this. Including the new 35,000-card H100 newly purchased in 2024), combined with the United States’ restrictive policy on the export of Chinese graphics cards, it becomes more difficult to catch up with this computing power.

In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?

Norm Marks shared a screenshot internally, source X.com@ChrisZheng001

In addition to the above data computing power challenges, the end-to-end system will also encounter What kind of new challenges are encountered, how to ensure the controllability of the system, how to detect problems as early as possible, solve problems through data-driven methods, and iterate quickly when rule codes cannot be used. Currently, for most autonomous driving R&D teams, Yandu is an unknown challenge.

Finally, end-to-end is still an organizational change for the current autonomous driving R&D team, because since L4 autonomous driving, the organizational structure of most autonomous driving teams is modular and is not only divided into perception groups, prediction groups Group, positioning group, planning control group, and even perception group are divided into visual perception, laser perception, etc. The end-to-end technical architecture directly eliminates the interface barriers between different modules, making the end-to-end R&D team need to integrate all human resources to adapt to the new technology paradigm. This is a great challenge to the inflexible team organizational culture.

The above is the detailed content of In 2024, will there be substantial breakthroughs and progress in end-to-end autonomous driving in China?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
在 CARLA自动驾驶模拟器中添加真实智体行为在 CARLA自动驾驶模拟器中添加真实智体行为Apr 08, 2023 pm 02:11 PM

arXiv论文“Insertion of real agents behaviors in CARLA autonomous driving simulator“,22年6月,西班牙。由于需要快速prototyping和广泛测试,仿真在自动驾驶中的作用变得越来越重要。基于物理的模拟具有多种优势和益处,成本合理,同时消除了prototyping、驾驶员和弱势道路使用者(VRU)的风险。然而,主要有两个局限性。首先,众所周知的现实差距是指现实和模拟之间的差异,阻碍模拟自主驾驶体验去实现有效的现实世界

特斯拉自动驾驶算法和模型解读特斯拉自动驾驶算法和模型解读Apr 11, 2023 pm 12:04 PM

特斯拉是一个典型的AI公司,过去一年训练了75000个神经网络,意味着每8分钟就要出一个新的模型,共有281个模型用到了特斯拉的车上。接下来我们分几个方面来解读特斯拉FSD的算法和模型进展。01 感知 Occupancy Network特斯拉今年在感知方面的一个重点技术是Occupancy Network (占据网络)。研究机器人技术的同学肯定对occupancy grid不会陌生,occupancy表示空间中每个3D体素(voxel)是否被占据,可以是0/1二元表示,也可以是[0, 1]之间的

一文通览自动驾驶三大主流芯片架构一文通览自动驾驶三大主流芯片架构Apr 12, 2023 pm 12:07 PM

当前主流的AI芯片主要分为三类,GPU、FPGA、ASIC。GPU、FPGA均是前期较为成熟的芯片架构,属于通用型芯片。ASIC属于为AI特定场景定制的芯片。行业内已经确认CPU不适用于AI计算,但是在AI应用领域也是必不可少。 GPU方案GPU与CPU的架构对比CPU遵循的是冯·诺依曼架构,其核心是存储程序/数据、串行顺序执行。因此CPU的架构中需要大量的空间去放置存储单元(Cache)和控制单元(Control),相比之下计算单元(ALU)只占据了很小的一部分,所以CPU在进行大规模并行计算

特斯拉自动驾驶硬件 4.0 实物拆解:增加雷达,提供更多摄像头特斯拉自动驾驶硬件 4.0 实物拆解:增加雷达,提供更多摄像头Apr 08, 2023 pm 12:11 PM

2 月 16 日消息,特斯拉的新自动驾驶计算机,即硬件 4.0(HW4)已经泄露,该公司似乎已经在制造一些带有新系统的汽车。我们已经知道,特斯拉准备升级其自动驾驶硬件已有一段时间了。特斯拉此前向联邦通信委员会申请在其车辆上增加一个新的雷达,并称计划在 1 月份开始销售,新的雷达将意味着特斯拉计划更新其 Autopilot 和 FSD 的传感器套件。硬件变化对特斯拉车主来说是一种压力,因为该汽车制造商一直承诺,其自 2016 年以来制造的所有车辆都具备通过软件更新实现自动驾驶所需的所有硬件。事实证

自动驾驶汽车激光雷达如何做到与GPS时间同步?自动驾驶汽车激光雷达如何做到与GPS时间同步?Mar 31, 2023 pm 10:40 PM

gPTP定义的五条报文中,Sync和Follow_UP为一组报文,周期发送,主要用来测量时钟偏差。 01 同步方案激光雷达与GPS时间同步主要有三种方案,即PPS+GPRMC、PTP、gPTPPPS+GPRMCGNSS输出两条信息,一条是时间周期为1s的同步脉冲信号PPS,脉冲宽度5ms~100ms;一条是通过标准串口输出GPRMC标准的时间同步报文。同步脉冲前沿时刻与GPRMC报文的发送在同一时刻,误差为ns级别,误差可以忽略。GPRMC是一条包含UTC时间(精确到秒),经纬度定位数据的标准格

端到端自动驾驶中轨迹引导的控制预测:一个简单有力的基线方法TCP端到端自动驾驶中轨迹引导的控制预测:一个简单有力的基线方法TCPApr 10, 2023 am 09:01 AM

arXiv论文“Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline“, 2022年6月,上海AI实验室和上海交大。当前的端到端自主驾驶方法要么基于规划轨迹运行控制器,要么直接执行控制预测,这跨越了两个研究领域。鉴于二者之间潜在的互利,本文主动探索两个的结合,称为TCP (Trajectory-guided Control Prediction)。具

一文聊聊自动驾驶中交通标志识别系统一文聊聊自动驾驶中交通标志识别系统Apr 12, 2023 pm 12:34 PM

什么是交通标志识别系统?汽车安全系统的交通标志识别系统,英文翻译为:Traffic Sign Recognition,简称TSR,是利用前置摄像头结合模式,可以识别常见的交通标志 《 限速、停车、掉头等)。这一功能会提醒驾驶员注意前面的交通标志,以便驾驶员遵守这些标志。TSR 功能降低了驾驶员不遵守停车标志等交通法规的可能,避免了违法左转或者无意的其他交通违法行为,从而提高了安全性。这些系统需要灵活的软件平台来增强探测算法,根据不同地区的交通标志来进行调整。交通标志识别原理交通标志识别又称为TS

一文聊聊SLAM技术在自动驾驶的应用一文聊聊SLAM技术在自动驾驶的应用Apr 09, 2023 pm 01:11 PM

定位在自动驾驶中占据着不可替代的地位,而且未来有着可期的发展。目前自动驾驶中的定位都是依赖RTK配合高精地图,这给自动驾驶的落地增加了不少成本与难度。试想一下人类开车,并非需要知道自己的全局高精定位及周围的详细环境,有一条全局导航路径并配合车辆在该路径上的位置,也就足够了,而这里牵涉到的,便是SLAM领域的关键技术。什么是SLAMSLAM (Simultaneous Localization and Mapping),也称为CML (Concurrent Mapping and Localiza

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software