search
HomeTechnology peripheralsAIInterpretation of Tesla's autonomous driving algorithms and models

Tesla is a typical AI company. It has trained 75,000 neural networks in the past year, which means a new model is produced every 8 minutes. A total of 281 models use Tesla cars. superior. Next, we will interpret Tesla FSD’s algorithm and model progress in several aspects.

01 Perception Occupancy Network

One of Tesla’s key technologies in perception this year is Occupancy Network. Students who study robotics will definitely be familiar with the occupation grid. Occupancy indicates whether each 3D voxel (voxel) in the space is occupied. It can be a binary representation of 0/1 or one between [0, 1]. probability value.

Why is estimation of occupancy important for autonomous driving perception? Because during driving, in addition to common obstacles such as vehicles and pedestrians, we can estimate their positions and sizes through 3D object detection. There are also more long-tail obstacles that will also have an important impact on driving. For example: 1. Deformable obstacles, such as two-section trailers, are not suitable to be represented by 3D bounding boxes; 2. Special-shaped obstacles, such as overturned vehicles, 3D attitude estimation will be invalid; 3. Not in known categories Obstacles such as stones and garbage on the road cannot be classified. Therefore, we hope to find a better expression to describe these long-tail obstacles and fully estimate the occupancy of each position in the 3D space, even the semantics and movement (flow).

Tesla uses the specific example in the figure below to demonstrate the power of Occupancy Network. Unlike 3D boxes, the representation of occupation does not make too many geometric assumptions about the object, so it can model objects of any shape and any form of object motion. The figure shows a scene where a two-section bus is starting. Blue represents moving voxels and red represents stationary voxels. The Occupancy Network accurately estimates that the first section of the bus has started to move, while the second section of the bus has started to move. The section is still at rest.

Interpretation of Teslas autonomous driving algorithms and models

Occupancy estimation of two buses starting, blue represents moving voxels, red represents stationary voxels

#The model structure of Occupancy Network is shown in the figure below. First, the model uses RegNet and BiFPN to obtain features from multiple cameras. This structure is consistent with the network structure shared at last year's AI day, indicating that the backbone has not changed much. The model then performs attention-based multi-camera fusion on 2D image features through spatial query with 3D spatial position. How to realize the connection between 3D spatial query and 2D feature map? The specific fusion method is not detailed in the figure, but there are many public papers for reference. I think the most likely solution is one of two solutions. The first one is called 3D-to-2D query, which projects the 3D spatial query onto the 2D feature map based on the internal and external parameters of each camera to extract the features of the corresponding position. This method was proposed in DETR3D, and BEVFormer and PolarFormer also adopted this idea. The second is to use positional embedding to perform implicit mapping, that is, add reasonable positional embedding to each position of the 2D feature map, such as camera internal and external parameters, pixel coordinates, etc., and then let the model learn the correspondence between 2D and 3D features by itself. . Next, the model undergoes time-series fusion. The implementation method is to splice the 3D feature space based on the known position and attitude changes of the self-vehicle.

Interpretation of Teslas autonomous driving algorithms and models

##Occupancy Network structure

After feature fusion, a deconvolution-based The decoder will decode the occupation, semantics and flow of each 3D space position. The press conference emphasized that because the output of this network is dense, the output resolution will be limited by memory. I believe this is also a major headache for all students who do image segmentation. What's more, what we are doing here is 3D segmentation, but autonomous driving has very high resolution requirements (~10cm). Therefore, inspired by neural implicit representation, an implicit queryable MLP decoder is designed at the end of the model. By inputting any coordinate value (x, y, z), the information of the spatial position can be decoded, that is, occupation, semantics, flow. This method breaks the limitation of model resolution, which I think is a highlight of the design.

02 Planning Interactive Planning

Planning is another important module of autonomous driving. Tesla this time mainly emphasizes interaction at complex intersections. ) for modeling. Why is interaction modeling so important? Because there is a certain degree of uncertainty in the future behavior of other vehicles and pedestrians, a smart planning module needs to predict multiple interactions between self-vehicles and other vehicles online, and evaluate the risks brought by each interaction, and finally Decide what strategy to pursue.

Tesla calls the planning model they adopt Interaction Search, which mainly consists of three main steps: tree search, neural network trajectory planning and trajectory scoring.

1. Tree search is a commonly used algorithm for trajectory planning. It can effectively discover various interactive situations and find the optimal solution. However, using the search method to solve trajectory planning problems encounters the biggest problem. The difficulty is that the search space is too large. For example, there may be 20 vehicles related to oneself at a complex intersection, which can be combined into more than 100 interaction methods, and each interaction method may have dozens of spatio-temporal trajectories as candidates. Therefore, Tesla did not use the trajectory search method, but used a neural network to score the target positions (goals) that may be reached after a period of time and obtain a small number of better targets.

2. After determining the target, we need to determine a trajectory to reach the target. Traditional planning methods often use optimization to solve this problem. It is not difficult to solve the optimization problem. Each optimization takes about 1 to 5 milliseconds. However, when there are many candidate targets given by the tree search in the previous steps, we cannot solve the problem in terms of time cost. burden. Therefore, Tesla proposed using another neural network for trajectory planning to achieve highly parallel planning for multiple candidate targets. There are two sources of trajectory labels for training this neural network: the first is the trajectory of real human driving, but we know that the trajectory of human driving may be only one of many better solutions, so the second source is through offline optimization Other trajectory solutions produced by the algorithm.

3. After obtaining a series of feasible trajectories, we need to choose an optimal solution. The solution adopted here is to score the obtained trajectory. The scoring solution combines artificially formulated risk indicators, comfort indicators, and a neural network scorer.

Through the decoupling of the above three steps, Tesla has implemented an efficient trajectory planning module that takes interaction into account. There are not many papers that can be referenced for trajectory planning based on neural networks. I have published a paper TNT [5] that is relatively related to this method. It also decomposes the trajectory prediction problem into the above three steps to solve: target scoring, Trajectory planning, trajectory scoring. Interested readers can check out the details. In addition, our research group has been exploring issues related to behavioral interaction and planning, and everyone is welcome to pay attention to our latest work InterSim[6].

Interpretation of Teslas autonomous driving algorithms and models

Interaction Search Planning Model Structure

03 Vector Map Lanes Network

Personally, I think another major technical highlight of this AI Day is the online vector map construction model Lanes Network. Students who paid attention to AI Day last year may remember that Tesla conducted complete online segmentation and recognition of maps in the BEV space. So why do we still want to build Lanes Network? Because the segmented pixel-level lanes are not enough for trajectory planning, we also need to get the topology of the lane lines to know that our car can change from one lane to another.

Let’s first take a look at what a vector map is. As shown in the figure, Tesla’s vector map consists of a series of blue lane centerlines and some key points (connection points connection, fork point, merge point), and their connection relationship is expressed in the form of graph.

Interpretation of Teslas autonomous driving algorithms and models

Vector map, the dots are the key points of the lane line, and the blue is the center line of the lane

Lanes Network is a decoder based on the backbone of the perceptual network in terms of model structure. Compared with decoding the occupancy and semantics of each voxel, it is more difficult to decode a series of sparse, connected lane lines because the number of outputs is not fixed, and there are logical relationships between the output quantities.

Tesla refers to the Transformer decoder in the natural language model and outputs the results autoregressively in a sequential manner. In terms of specific implementation, we must first select a generation order (such as from left to right, top to bottom) and discretize the space (tokenization). Then we can use Lanes Network to predict a series of discrete tokens. As shown in the figure, the network will first predict the rough position (index: 18) and precise position (index: 31) of a node, then predict the semantics of the node ("Start", which is the starting point of the lane line), and finally predict the connection Characteristics, such as bifurcation/merging/curvature parameters, etc. The network will generate all lane line nodes in this autoregressive manner.

Interpretation of Teslas autonomous driving algorithms and models

##Lanes Network network structure

We should note that autoregression Sequence generation is not patented by the language Transformer model. Our research group has also published two related papers on generating vector maps in the past few years, HDMapGen[7] and VectorMapNet[8]. HDMapGen uses the graph neural network with attention (GAT) to autoregressively generate the key points of the vector map, which is similar to Tesla's solution. VectorMapNet uses Detection Transformer (DETR) to solve this problem, using a set prediction solution to generate vector maps more quickly.

Interpretation of Teslas autonomous driving algorithms and models

HDMapGen vector map generation result

Interpretation of Teslas autonomous driving algorithms and models

VectorMapNet vector map generation results

04 Autolabeling

Auto labeling is also Tes La is a technology that was explained at last year’s AI Day. This year’s automatic annotation focuses on the automatic annotation of Lanes Network. Tesla vehicles can generate 500,000 driving journeys (trips) every day, and making good use of this driving data can better help predict lane lines.

Tesla’s automatic lane marking has three steps:

1. Through visual inertial odometry technology, High-precision trajectory estimation for all journeys.

2. Map reconstruction of multiple vehicles and multiple journeys is the most critical step in this plan. The basic motivation for this step is that different vehicles may observe the same location from different spatial angles and times, so aggregating this information can lead to better map reconstruction. The technical points of this step include geometric matching between maps and joint optimization of results.

3. Automatically mark lanes for new journeys. When we have high-precision offline map reconstruction results, when a new journey occurs, we can perform a simple geometric matching to obtain the pseudo-true value (pseudolabel) of the lane line of the new journey. This method of obtaining pseudo-true values ​​is sometimes even better than manual annotation (at night, rainy and foggy days).

Interpretation of Teslas autonomous driving algorithms and models

##Lanes Network automatically annotates

05 Simulation

The simulation of visual images has been a popular direction in computer vision in recent years. In autonomous driving, the main purpose of visual simulation is to generate some rare scenes in a targeted manner, thereby eliminating the need to try your luck in real road tests. For example, Tesla has always had a headache with the scene of a large truck lying in the middle of the road. But visual simulation is not a simple problem. For a complex intersection (Market Street in San Francisco), the solution using traditional modeling and rendering requires the designer 2 weeks. Tesla’s AI-based solution now only takes 5 minutes.

Interpretation of Teslas autonomous driving algorithms and models

Visual simulation reconstructed intersection

Specifically, visual simulation The prerequisite is to prepare automatically labeled real-world road information and a rich graphics material library. Then proceed to the following steps in sequence:

1. Pavement generation: Fill the road surface according to the curb, including road slope, material and other detailed information.

2. Lane line generation: draw lane line information on the road surface.

3. Plant and building generation: Randomly generate and render plants and houses between roads and roadsides. The purpose of generating plants and buildings is not only for visual beauty, it also simulates the occlusion effect caused by these objects in the real world.

4. Generate other road elements: such as traffic lights, street signs, and import lanes and connection relationships.

5. Add dynamic elements such as vehicles and pedestrians.

06 Infrastructure

Finally, let’s briefly talk about the foundation of Tesla’s series of software technologies, which is powerful infrastructure . Tesla’s supercomputing center has 14,000 GPUs and a total of 30PB of data cache, and 500,000 new videos flow into these supercomputers every day. In order to process this data more efficiently, Tesla has specially developed an accelerated video decoding library, as well as a file format .smol file format that accelerates reading and writing intermediate features. In addition, Tesla has also developed its own chip Dojo for the supercomputing center, which we will not explain here.

Interpretation of Teslas autonomous driving algorithms and models

Super computing center for video model training

07 Summary

With the release of Tesla AI Day content in the past two years, we have slowly seen Tesla’s technical landscape in the direction of automatic (assisted) driving. At the same time, we We also see that Tesla itself is constantly iterating on itself, such as from 2D perception, BEV perception, to Occupancy Network. Autonomous driving is a long journey of thousands of miles. What is supporting the evolution of Tesla's technology? I think there are three points: full scene understanding capabilities brought by visual algorithms, model iteration speed supported by powerful computing power, and generalization brought by massive data. Aren’t these the three pillars of the deep learning era?

The above is the detailed content of Interpretation of Tesla's autonomous driving algorithms and models. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
特斯拉自动驾驶算法和模型解读特斯拉自动驾驶算法和模型解读Apr 11, 2023 pm 12:04 PM

特斯拉是一个典型的AI公司,过去一年训练了75000个神经网络,意味着每8分钟就要出一个新的模型,共有281个模型用到了特斯拉的车上。接下来我们分几个方面来解读特斯拉FSD的算法和模型进展。01 感知 Occupancy Network特斯拉今年在感知方面的一个重点技术是Occupancy Network (占据网络)。研究机器人技术的同学肯定对occupancy grid不会陌生,occupancy表示空间中每个3D体素(voxel)是否被占据,可以是0/1二元表示,也可以是[0, 1]之间的

基于因果森林算法的决策定位应用基于因果森林算法的决策定位应用Apr 08, 2023 am 11:21 AM

译者 | 朱先忠​审校 | 孙淑娟​在我之前的​​博客​​中,我们已经了解了如何使用因果树来评估政策的异质处理效应。如果你还没有阅读过,我建议你在阅读本文前先读一遍,因为我们在本文中认为你已经了解了此文中的部分与本文相关的内容。为什么是异质处理效应(HTE:heterogenous treatment effects)呢?首先,对异质处理效应的估计允许我们根据它们的预期结果(疾病、公司收入、客户满意度等)选择提供处理(药物、广告、产品等)的用户(患者、用户、客户等)。换句话说,估计HTE有助于我

Mango:基于Python环境的贝叶斯优化新方法Mango:基于Python环境的贝叶斯优化新方法Apr 08, 2023 pm 12:44 PM

译者 | 朱先忠审校 | 孙淑娟引言模型超参数(或模型设置)的优化可能是训练机器学习算法中最重要的一步,因为它可以找到最小化模型损失函数的最佳参数。这一步对于构建不易过拟合的泛化模型也是必不可少的。优化模型超参数的最著名技术是穷举网格搜索和随机网格搜索。在第一种方法中,搜索空间被定义为跨越每个模型超参数的域的网格。通过在网格的每个点上训练模型来获得最优超参数。尽管网格搜索非常容易实现,但它在计算上变得昂贵,尤其是当要优化的变量数量很大时。另一方面,随机网格搜索是一种更快的优化方法,可以提供更好的

因果推断主要技术思想与方法总结因果推断主要技术思想与方法总结Apr 12, 2023 am 08:10 AM

导读:因果推断是数据科学的一个重要分支,在互联网和工业界的产品迭代、算法和激励策略的评估中都扮演者重要的角色,结合数据、实验或者统计计量模型来计算新的改变带来的收益,是决策制定的基础。然而,因果推断并不是一件简单的事情。首先,在日常生活中,人们常常把相关和因果混为一谈。相关往往代表着两个变量具有同时增长或者降低的趋势,但是因果意味着我们想要知道对一个变量施加改变的时候会发生什么样的结果,或者说我们期望得到反事实的结果,如果过去做了不一样的动作,未来是否会发生改变?然而难点在于,反事实的数据往往是

使用Pytorch实现对比学习SimCLR 进行自监督预训练使用Pytorch实现对比学习SimCLR 进行自监督预训练Apr 10, 2023 pm 02:11 PM

SimCLR(Simple Framework for Contrastive Learning of Representations)是一种学习图像表示的自监督技术。 与传统的监督学习方法不同,SimCLR 不依赖标记数据来学习有用的表示。 它利用对比学习框架来学习一组有用的特征,这些特征可以从未标记的图像中捕获高级语义信息。SimCLR 已被证明在各种图像分类基准上优于最先进的无监督学习方法。 并且它学习到的表示可以很容易地转移到下游任务,例如对象检测、语义分割和小样本学习,只需在较小的标记

​盒马供应链算法实战​盒马供应链算法实战Apr 10, 2023 pm 09:11 PM

一、盒马供应链介绍1、盒马商业模式盒马是一个技术创新的公司,更是一个消费驱动的公司,回归消费者价值:买的到、买的好、买的方便、买的放心、买的开心。盒马包含盒马鲜生、X 会员店、盒马超云、盒马邻里等多种业务模式,其中最核心的商业模式是线上线下一体化,最快 30 分钟到家的 O2O(即盒马鲜生)模式。2、盒马经营品类介绍盒马精选全球品质商品,追求极致新鲜;结合品类特点和消费者购物体验预期,为不同品类选择最为高效的经营模式。盒马生鲜的销售占比达 60%~70%,是最核心的品类,该品类的特点是用户预期时

人类反超 AI:DeepMind 用 AI 打破矩阵乘法计算速度 50 年记录一周后,数学家再次刷新人类反超 AI:DeepMind 用 AI 打破矩阵乘法计算速度 50 年记录一周后,数学家再次刷新Apr 11, 2023 pm 01:16 PM

10 月 5 日,AlphaTensor 横空出世,DeepMind 宣布其解决了数学领域 50 年来一个悬而未决的数学算法问题,即矩阵乘法。AlphaTensor 成为首个用于为矩阵乘法等数学问题发现新颖、高效且可证明正确的算法的 AI 系统。论文《Discovering faster matrix multiplication algorithms with reinforcement learning》也登上了 Nature 封面。然而,AlphaTensor 的记录仅保持了一周,便被人类

机器学习必知必会十大算法!机器学习必知必会十大算法!Apr 12, 2023 am 09:34 AM

1.线性回归线性回归(Linear Regression)可能是最流行的机器学习算法。线性回归就是要找一条直线,并且让这条直线尽可能地拟合散点图中的数据点。它试图通过将直线方程与该数据拟合来表示自变量(x 值)和数值结果(y 值)。然后就可以用这条线来预测未来的值!这种算法最常用的技术是最小二乘法(Least of squares)。这个方法计算出最佳拟合线,以使得与直线上每个数据点的垂直距离最小。总距离是所有数据点的垂直距离(绿线)的平方和。其思想是通过最小化这个平方误差或距离来拟合模型。例如

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)