Home > Article > Software Tutorial > Ideal to join the “end-to-end” competition: even though it’s just a PPT
End-to-end competition is a high-profile trend in today’s technology field. Is Li Auto making a difference in this regard? PHP editor Youzi will give you an in-depth discussion of the end-to-end layout of Li Auto, from PPT display to actual implementation, to explore its clues.
The "end-to-end" trend is sweeping China's smart driving industry.
Entering the second half of 2024, if any car company does not talk about "end-to-end" when talking about intelligent driving, it will most likely be regarded as falling behind.
On July 5, Li Auto released a new autonomous driving technology architecture based on the end-to-end model, VLM visual language model and world model. This is basically a methodology output of the end-to-end path of Ideal Auto, which more completely presents the next development path of Ideal Intelligent Driving.
From the perspective of Huxiu Automobile, there are three major focuses worthy of attention in this release - How is the ideal "end-to-end" different from other players'? To what extent has the development of ideal smart driving reached? And, why does Ideal emphasize the technical capabilities of intelligent driving at this time?
Compared with Huawei, Ideal’s solution is more radical
Let’s first look at Ideal’s new autonomous driving technology architecture. Inspired by the fast and slow system theory of Nobel Prize winner Daniel Kahneman, it simulates human thinking and decision-making processes in the field of autonomous driving, using "fast systems" and "slow systems" for collaboration.
Fast system, also known as System 1, is good at handling simple tasks. It is human intuition formed based on experience and habits, which is enough to handle 95% of the routine scenarios when driving a vehicle.
The slow system, also known as System 2, is the logical reasoning, complex analysis and computing capabilities formed by humans through deeper understanding and learning. It is used to solve complex or even unknown traffic scenarios when driving a vehicle, accounting for 50% of daily driving About 5%.
Under this architecture prototype, System 1 is implemented by the end-to-end model, which receives sensor input and directly outputs the driving trajectory for controlling the vehicle. System 2 is implemented by the VLM visual language model. After receiving sensor input, it outputs decision-making information to System 1 after logical thinking. The autonomous driving capability composed of dual systems will be trained and verified using the world model in the cloud.
According to ideal theory, the end-to-end model of System 1 adopts the One Model solution, which is mainly composed of cameras and lidar. Multi-sensor features are extracted and fused by the CNN backbone network and projected into the BEV space.
In addition, Ideal also adds vehicle status information and navigation information to the input end. After encoding by the Transformer model, it is decoded with BEV features to decode dynamic obstacles, road structures and general obstacles, and plan the driving trajectory.
Compared to the segmented end-to-end solutions adopted by manufacturers such as Huawei and Xpeng, the One Model solution adopted by Ideal is more radical. Tesla is also a One Model solution, but its "input image, output control" solution goes further than the ideal "input sensor information, output driving trajectory".
It needs to be pointed out that the current end-to-end paths used by various manufacturers are just differences in choice, and there is no distinction between advantages and disadvantages. (As for the end-to-end technical principles, the Huxiu Automobile team has provided a detailed analysis in the article "Tesla is going to war with Huawei")
The special thing about this ideal architecture is actually System 2, which is based on The algorithm architecture of the VLM visual language model consists of a unified Transformer model, which encodes the Prompt (prompt word) text with a Tokenizer (word segmenter), encodes the visual information of the forward-looking camera image and navigation map information, and then uses the image and text to encode the visual information. The alignment module performs modal alignment, and finally performs unified autoregressive reasoning, outputs the understanding of the environment, driving decisions and driving trajectories, and passes them to System 1 to assist in controlling the vehicle.
In actual scenarios, if System 2 finds that the road surface is very bumpy and uneven during driving, it will send a speed reduction reminder to System 1 and inform the driver that the vehicle on the potholed road ahead will drive slowly, reducing Bumps; or it can identify the location of bus lanes and identify tidal lanes, etc.
In ideal terms, System 2 is equivalent to having a driving school instructor sitting in the co-pilot’s seat to monitor driving behavior at all times. It is worth mentioning that Xpeng’s large language model XBrain and Haomo’s large autonomous driving semantic perception model also have similar capabilities.
It is reported that the ideal VLM model parameter amount reaches 2.2 billion, and the inference time of the VLM model on the vehicle side has also been optimized from 4.1 seconds to 0.3 seconds.
In addition to dual systems, Ideal also introduced the testing and verification methods of the end-to-end solution. The mainstream approach in the industry is to conduct simulation testing through 3D virtual environment, reconstructive simulation, generative simulation, etc. The ideal approach is to combine the two technical paths of reconstructed simulation and generated simulation, which is equivalent to reconstructing real questions and generating simulated questions.
In fact, Tesla also uses large models to generate continuous videos to create World Model; the autonomous driving company Wayve’s large autonomous driving model GAIA-1 (already has 9 billion parameters) can also generate driving scene videos , describe scenarios and make predictions.
In general, the ideal technical architecture is to deploy dual systems on the car side. The end-to-end model of ONE Model allows its autonomous driving system to behave like an experienced human driver; the VLM model can enable autonomous driving The system has the same logical thinking ability as humans; while the world model provides a learning and examination environment and has the ability to quickly iterate.
According to Lang Xianpeng, head of Lili Intelligent Driving, its end-to-end solution has been incubated internally and started pre-research since the second half of last year. It has now completed prototype verification of the model and deployment of real vehicles.
However, this solution is still difficult to deliver to users. The ideal solution for AD Max users this month is to push the image-free NOA solution.
Intelligent driving is ushering in the moment of overtaking
"End-to-end" is becoming an important direction in the pursuit of intelligent driving by various manufacturers.
In March this year, Yuanrong Qixing was able to successfully put the end-to-end model on the bus; when Huawei released Qiankun 3.0 in April, it stated that its technology had shifted to a new GOD/PDP network architecture to achieve pre-decision planning for a network; Xpeng 5 In March, it was announced on AI DAY that the end-to-end large model has been put into mass production. In addition, manufacturers including Weilai, Xiaomi, Xpeng and other manufacturers have adjusted their smart driving teams to carry out end-to-end layout.
It can be seen that various manufacturers, including Tesla, are currently exploring the end-to-end technology direction. Although the options and paths are different, what is certain is that end-to-end technology End-to-end has become the direction of intelligent driving.
However, end-to-end will amplify the upper and lower limits of the intelligent driving system. While it can improve intelligent driving capabilities, it also brings about difficult to solve security problems. End-to-end is a neural network black box, which cannot Interpretability brings with it some security concerns.
While various companies are vying for layout, Ideal is the first car company to disclose its end-to-end technology solutions. Li Xiang himself disclosed the two major systems at the Chongqing Forum last month, which aroused heated discussions in the industry. This time It also announced the full set of technical architecture plans, which can be said to be striking while the iron is hot.
Considering that Lideal will not release new products in the second half of the year, its first output in smart driving capabilities can not only keep the company popular, but also maintain the competitiveness of its existing products. In addition, the layout of the end-to-end technology path also gives Ideal the opportunity to catch up in intelligent driving capabilities.
Compared with the segmented end-to-end approach adopted by Huawei, Xiaopeng, etc., the ideal end-to-end model is more difficult to implement. How long does it take to move from PPT to mass production and how effective it is? You still need to maintain observe.
According to the "End-to-End Autonomous Driving Industry Research Report" released by Chentao Capital, domestic autonomous driving companies' modular end-to-end solutions may be put into mass production in 2025. It’s a mule or a horse, and next year it’s time to take it out for a walk.
Report article content
This article is reprinted from Kuai Technology. The opinions in the article only represent the author's personal views. This site only stores information
The above is the detailed content of Ideal to join the “end-to-end” competition: even though it’s just a PPT. For more information, please follow other related articles on the PHP Chinese website!