Let's talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving?-AI-php.cn

Let's talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 15, 2024 pm 04:13 PM

end-to-endAutopilot

In the past month, due to some well-known reasons, I have had very intensive exchanges with various teachers and classmates in the industry. An inevitable topic in the exchange is naturally end-to-end and the popular Tesla FSD V12. I would like to take this opportunity to sort out some of my thoughts and opinions at this moment for your reference and discussion.

Lets talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving?

#How to define an end-to-end autonomous driving system, and what problems should we expect to solve end-to-end?

According to the most traditional definition, an end-to-end system refers to a system that inputs raw information from sensors and directly outputs variables of concern to the task. For example, in image recognition, the CNN method compared to the traditional feature extractor classifier can be called end-to-end. In autonomous driving tasks, data from various sensors (cameras/LiDAR/Radar/IMU, etc.) are input and control signals for vehicle driving (throttle/steering wheel angle, etc.) are directly output. In order to consider the adaptation problem between different models, the output can also be relaxed to the trajectory of the vehicle. This is a definition in the traditional sense, or what I call a narrow end-to-end definition. On this basis, some intermediate task supervision has also been derived to improve performance capabilities.

However, in addition to such a narrow definition, we should also think about it essentially, what is the essence of end-to-end? I think the essence of end-to-end should be the lossless transmission of sensory information. Let's first recall what the interface between sensing and PnC modules looks like in a non-end-to-end system. Generally, we will have detection/attribute analysis/prediction for whitelist objects (cars, people, etc.), and understanding of the static environment (road structure/speed limit/traffic lights, etc.). If we do it more carefully, We will also do some detection work for general obstacles. From a macro perspective, the information output by perception is an abstraction of complex driving scenarios, and it is an explicit abstraction defined manually. However, for some unusual scenarios, the current explicit abstraction cannot fully express the factors that affect driving behavior in the scene, or the tasks we need to define are too many and too trivial, and it is difficult to enumerate all required tasks. Therefore, the end-to-end system provides a (perhaps implicit) comprehensive representation, hoping to automatically and losslessly apply such information to the PnC. I think that all systems that can meet such requirements can be called generalized end-to-end.

As for other problems, such as some optimizations of dynamic interaction scenarios, my personal opinion is that at least not only end-to-end can solve these problems. Traditional methods can solve these problems. Of course, when the amount of data is large enough, end-to-end may provide a pretty good solution. Whether this is necessary will be discussed in the next few questions.

Lets talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving?

#Some misunderstandings about end-to-end autonomous driving?

Be sure to output control signals and waypoints to be end-to-end

For the concept of generalized end-to-end, if you can agree with the concept mentioned above, Then this problem is easy to understand. End-to-end emphasizes the lossless transmission of information, rather than directly outputting the task volume. Such an end-to-end processing method requires a large number of covert solutions to ensure security, and will also encounter some problems during the implementation process, which will gradually unfold in subsequent processing.

The end-to-end system must be based on large models or pure vision

Lets talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving?

The concept of end-to-end autonomous driving and large model automation There is no necessary connection between driving and purely visual autonomous driving. These three concepts exist completely independently. An end-to-end system does not have to be driven by a large model in the traditional sense, nor is it necessarily purely visual. There are some connections between the three, but they are not equivalent.

I have a previous article that elaborated on the relationship between these concepts. For details, see: https://zhuanlan.zhihu.com/p/664189972

In the long run, Is it possible for the above-mentioned end-to-end system in a narrow sense to achieve autonomous driving above L3 level?

Actually, I want to make a complaint first. Those who claim to use large models to subvert L4 have never actually done L4; those who claim to be end-to-end cure all diseases have never done PnC. So after chatting with many people who are enthusiastic about end-to-end, it turned into a purely religious dispute that cannot be verified or falsified. We students who are engaged in cutting-edge research and development should still be more pragmatic and pay attention to evidence. . . At the very least, you should have some basic knowledge of what you want to subvert and understand the thorny issues involved. This is the basic scientific quality you should have. . .

Getting back to the subject, at present, I am pessimistic. Regardless of the fact that the FSD currently claims to be purely end-to-end, its performance is far from reaching the reliability and stability required above the L3 level. In the future, even if this vehicle is statistically as safe as a human, it will still have to face how to be as safe as a human. Driver's error in aligning. To put it more bluntly, if an autonomous driving system wants to be accepted by the public and public opinion, the key may not lie in an absolute accident rate and fatality rate, but in whether the public can accept that there are some scenarios that are harmful to humans. Relatively easy to solve, whereas machines make mistakes. This requirement is more difficult to achieve for a pure end-to-end system. More specifically, it was explained in an answer I gave in 2021. For details, see:

How to view Robin Li’s Moments post: Driverless driving will definitely cause an accident, but the probability is much lower than that of manned driving?

https://www.zhihu.com/question/530828899/answer/2590673435?utm_psn=1762524415009697792

Take Waymo and Cruise in North America as examples. In fact, they have produced many products respectively. Accidents, but why was Cruise’s last accident so unacceptable to regulators and the public? This accident caused two injuries. The first collision was quite difficult for human drivers to avoid, but it was actually acceptable. However, after this collision, serious secondary injuries occurred: the system misjudged the location of the collision and the location of the injured. In order not to block traffic, it downgraded to pull-over mode and dragged the injured for a long time. Such a behavior is something that no normal human driver would do, and the impact is very bad. This incident directly led to some subsequent turmoil in Cruise. This incident actually sounded the alarm for us. How to avoid such things from happening should be a serious consideration in the development and operation of autonomous driving systems.

So at this moment, what are the practical solutions for the next generation of mass-produced assisted driving systems?

To put it simply, I think a suitable system should first fully explore the upper limit of the capabilities of the traditional system, and then combine it with end-to-end flexibility and universality, which is a gradual An end-to-end solution. Of course, how to combine the two organically is a paid content, haha. . . But we can analyze what the so-called end-to-end or learning based planner is actually doing now.

Based on my limited understanding, when the so-called end-to-end model is used in driving, the output trajectory will be followed by a solution based on traditional methods, or such a learning based planner and traditional The trajectory planning algorithm will output multiple trajectories at the same time, and then select one for execution through a selector. If the system architecture is designed in this way, the upper limit of performance of such a cascade system is actually limited by such a cover-up plan and selector. If such a solution is still based on pure feedforward learning, there will still be unpredictable failures, which essentially cannot achieve the purpose of being safe. If you consider using a traditional planning method to optimize or select on such an output trajectory, it is equivalent to the trajectory produced by the learning based method. is just an initial solution to such an optimization and search problem. Why do we Why not directly optimize and search for such trajectories?

Of course, some students will jump out and say that such an optimization or search problem is non-convex, and the state space is too large to run in real time on the vehicle system. I ask everyone to think carefully about this question here: In the past 10 years, the perception system has enjoyed at least 100x computing power dividend development, but what about our PnC module? If we also allow the PnC module to use large computing power, combined with some developments in advanced optimization algorithms in recent years, will this conclusion still hold? In response to such problems, we should not rest on our laurels and rely on paths, but should think about what is right from first principles.

Lets talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving?

How to reconcile the relationship between data-driven and traditional methods?

In fact, an example that is very similar to autonomous driving is playing chess. Just in February this year, Deepmind published an article (Grandmaster-Level Chess Without Search: https://arxiv.org /abs/2402.04494) is exploring whether it is feasible to use only data-driven and abandon MCTS search in AlphaGo and AlphaZero. An analogy to autonomous driving is that only one network is used to directly output actions, discarding all subsequent steps. The conclusion of the article is that under a considerable scale of data and model parameters, a reasonable result can be obtained without searching. However, compared with the method plus search, there is still a very significant gap. (The comparison here in the article is actually not fair. The actual gap should be even greater.) Especially when it comes to solving some difficult endgames, pure data-driven performance is very poor. This analogy to autonomous driving means that in difficult scenarios or corner cases that require multi-step games, it is still difficult to completely abandon traditional optimization or search algorithms. Reasonably utilizing the advantages of various technologies like AlphaZero is the most efficient way to improve performance.

Traditional method = rule based if else?

This concept also needs to be corrected repeatedly in my interactions with many people. According to many people's definition, as long as it is not purely data-driven, it is called rule based. Let’s take the example of playing chess again. Memorizing formulas and chess records by rote is rule based, but if you give the model reasoning capabilities through search and optimization like AlphaGo and AlphaZero, I don’t think it can be called rule based. This is precisely what the current large model itself lacks, and what researchers are trying to give a learning based model through CoT and other methods. However, every action of a person driving has a clear motivation, which is different from tasks such as pure data-driven image recognition that cannot clearly describe the reasons. Under a suitable algorithm architecture design, decision trajectories should become variables and be optimized uniformly under the guidance of a scientific goal. Instead of forcibly applying patches and adjusting parameters to fix various cases. Naturally, such a system will not have strange rules with various hardcodes.

Summary

Finally, end-to-end may be a promising technical route, but there is still much to be explored about how such a concept can be put into practice. matter. Is it the only correct solution to pile up data and model parameters? In my opinion, it is not the case at the moment. I feel that as a cutting-edge research technician at any time, we should truly pursue the first principles and engineer thinking mentioned by Musk, and think about the essence of the problem from practice, rather than turning Musk himself into a first principle. principle. If you want to be really far ahead, you should not give up thinking and follow what others say, otherwise you will have to keep trying to overtake in corners.

The above is the detailed content of Let's talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]May 14, 2025 am 05:04 AM

ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

Calculating The Risk Of ASI Starts With Human MindsMay 14, 2025 am 05:02 AM

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

An easy-to-understand explanation of how to write and compose lyrics and recommended tools in ChatGPTMay 14, 2025 am 05:01 AM

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

What is ChatGPT-4? A thorough explanation of what you can do, the pricing, and the differences from GPT-3.5!May 14, 2025 am 05:00 AM

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

Explaining how to use the ChatGPT app! Japanese support and voice conversation functionMay 14, 2025 am 04:59 AM

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

How do I use the Chinese version of ChatGPT? Explanation of registration procedures and feesMay 14, 2025 am 04:56 AM

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

5 AI Agent Myths You Need To Stop Believing NowMay 14, 2025 am 04:54 AM

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

An easy-to-understand explanation of the illegality of creating and managing multiple accounts using ChatGPTMay 14, 2025 am 04:50 AM

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI

See all articles