Home >Technology peripherals >AI >Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal: 2024-06-11 17:29:581269browse

写在前面&笔者的个人理解

最近来，随着深度学习技术的发展和突破，大规模的基础模型（Foundation Models）在自然语言处理和计算机视觉领域取得了显著性的成果。基础模型在自动驾驶当中的应用也有很大的发展前景，可以提高对于场景的理解和推理。

通过对丰富的语言和视觉数据进行预训练，基础模型可以理解和解释自动驾驶场景中的各类元素并进行推理，为驾驶决策和规划提供语言和动作命令。
基础模型可以根据对驾驶场景的理解来实现数据增强，用于提供在常规驾驶和数据收集期间不太可能遇到的长尾分布中那些罕见的可行场景以实现提高自动驾驶系统准确性和可靠性的目的。
对基础模型应用的另外一个场景是在于世界模型，该模型展示了理解物理定律和动态事物的能力。通过采用自监督的学习范式对海量数据进行学习，世界模型可以生成不可见但是可信的驾驶场景，促进对于动态物体行为预测的增强以及驾驶策略的离线训练过程。

本文主要概述了基础模型在自动驾驶领域中的应用，并根据基础模型在自动驾驶模型方面的应用、基础模型在数据增强方面的应用以及基础模型中世界模型对于自动驾驶方面的应用三方面进行展开。在自动驾驶模型方面，基础模型可以用于实现各种自动驾驶功能，例如车辆的感知、决策和控制等。通过基础模型，车辆可以获取周围环境的信息，并做出相应的决策和控制动作。在数据增强方面，基础模型可以用于增强数据

本文链接：https://arxiv.org/pdf/2405.02288

自动驾驶模型

基于语言和视觉基础模型的类人驾驶

在自动驾驶中，语言和视觉的基础模型显示出了巨大的应用潜力，通过增强自动驾驶模型在驾驶场景中的理解和推理，实现自动驾驶的类人驾驶。下图展示了基于语言和视觉的基础模型对驾驶场景的理解以及给出语言引导指令和驾驶行为的推理。

Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

基础模型对于自动驾驶模型增强范式

目前很多工作都已经证明语言和视觉特征可以有效增强模型对于驾驶场景的理解，再获取对于当前环境的整体感知理解后，基础模型就会给出一系列的语言命令，如：“前方有红灯，减速慢行”，“前方有十字路口，注意行人”等相关语言指令，便于自动驾驶汽车根据相关的语言指令执行最终的驾驶行为。

近年来，学术界和工业界将GPT的语言知识嵌入到自动驾驶的决策过程中。以语言命令的形式提高自动驾驶的性能，以促进大模型自动驾驶中的应用。考虑到大模型有望真正部署在车辆端，它最终需要落在规划或控制指令上，基础模型最终应该从动作状态级别授权自动驾驶。一些学者已经进行了初步探索，但仍有很多发展空间。更重要的是，一些学者通过类似GPT的方法探索了自动驾驶模型的构建，该方法直接输出基于大规模语言模型的轨迹，然后通过控制命令实现，相关工作已经汇总在如下表格中。

Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

使用预训练主干网络进行端到端自动驾驶

上述的相关内容其核心思路是提高自动驾驶决策的可解释性，增强场景理解解析，指导自动驾驶系统的规划或控制。在过去的一段时间内，有许多工作一直以各种方式优化预训练模型主干网络，并且取得了非常不错的成果。因此，为了更加全面的总结基础模型在自动驾驶中的应用，我们对预训练主干网络以及取得了非常不错的成果的研究进行了总结和回顾。下图展示了端到端自动驾驶的整体过程。

Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

基于预训练主干网络的端到端自动驾驶系统的流程图

In the overall process of end-to-end autonomous driving, extracting low-level information from raw data determines the potential of subsequent model performance to a certain extent. Excellent pre-training backbone can make the model have stronger feature learning capabilities. Pre-trained convolutional networks such as ResNet and VGG are the most widely used backbone networks for end-to-end model visual feature extraction. These pre-trained networks are usually trained using object detection or segmentation as the task of extracting generalized features, and the performance they achieve has been verified in many works.

In addition, early end-to-end autonomous driving models were mainly based on various types of convolutional neural networks and were completed through imitation learning or reinforcement learning. Some recent work has attempted to build an end-to-end autonomous driving system with a Transformer network structure, and has also achieved relatively good results, such as Transfuser, FusionAD, UniAD and other works.

Data enhancement

With the further development of deep learning technology and the further improvement and upgrade of the underlying network architecture, basic models with pre-training and fine-tuning have shown Increasingly powerful performance. The basic model represented by GPT has enabled the transformation of large models from the rules of the learning paradigm to a data-driven approach. The importance of data as a key link in model learning is irreplaceable. During the training and testing of autonomous driving models, a large amount of scene data is used to enable the model to have good understanding and decision-making capabilities for various road and traffic scenarios. The long-tail problem faced by autonomous driving is also the fact that there are endless unknown edge scenarios, which makes the model's generalization ability seem to never be enough, resulting in poor performance.

Data augmentation is crucial to improving the generalization ability of autonomous driving models. The implementation of data enhancement needs to consider two aspects

On the one hand: how to obtain large-scale data so that the data provided to the autonomous driving model is sufficiently diverse and extensive
On the other hand: how to obtain as much high-quality data as possible, so that the data used for training and testing autonomous driving models is accurate and reliable

Therefore, related research work is mainly carried out from the above two aspects. Technical research: First, enrich the data content in existing data sets and enhance data characteristics in driving scenarios. The second is to generate multi-level driving scenarios through simulation.

Extending autonomous driving data sets

Existing autonomous driving data sets are mainly obtained by recording sensor data and then labeling the data. The data features obtained in this way are usually very low-level, and the magnitude of the data set is also relatively poor, which is completely insufficient for the visual feature space of autonomous driving scenarios. The advanced semantic understanding, reasoning and interpretation capabilities of the basic model represented by the language model provide new ideas and technical approaches for the enrichment and expansion of autonomous driving data sets. Expanding the data set by leveraging the advanced understanding, reasoning, and interpretation capabilities of the underlying model can help better evaluate the explainability and control of autonomous driving systems, thereby improving the safety and reliability of autonomous driving systems.

Generating driving scenes

Driving scenes are of great significance to autonomous driving. In order to obtain different driving scene data, relying only on the vehicle's sensors for real-time collection requires huge costs, and it is difficult to obtain enough scene data for some edge scenes. Generating realistic driving scenes through simulation has attracted the attention of many researchers. Traffic simulation research is mainly divided into two categories: rule-based and data-driven.

Rule-based approach: Use predefined rules, which are often insufficient to describe complex driving scenarios, and the simulated driving scenarios are simpler and more general
Based on data-driven The approach: Use driving data to train a model from which it can continuously learn and adapt. However, data-driven methods usually require a large amount of labeled data for training, which hinders the further development of traffic simulation

With the development of technology, the current way of generating data has gradually transformed from a rule-based approach A data-driven approach. By efficiently and accurately simulating driving scenarios, including various complex and dangerous situations, a large amount of training data is provided for model learning, which can effectively improve the generalization ability of the autonomous driving system. At the same time, the generated driving scenarios can also be used to evaluate different autonomous driving systems and algorithms to test and verify system performance. The following table is a summary of different data augmentation strategies.

Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

Summary of different data enhancement strategies

World Model

A world model is considered an artificial intelligence model that contains an overall understanding or representation of the environment in which it operates. The model is able to simulate the environment to make predictions or decisions. In recent literature, the term "world model" is mentioned in the context of reinforcement learning. This concept is also gaining traction in autonomous driving applications because of its ability to understand and elucidate the dynamics of the driving environment. World models are highly related to reinforcement learning, imitation learning, and deep generative models. However, utilizing world models in reinforcement learning and imitation learning usually requires well-labeled data, and methods such as SEM2 and MILE are performed in the supervised paradigm. At the same time, there are also attempts to combine reinforcement learning and unsupervised learning based on the limitations of labeled data. Due to their close association with self-supervised learning, deep generative models have become increasingly popular and a lot of work has been proposed. The figure below shows the overall flow chart of using the world model to enhance the autonomous driving model.

Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

Overall flow chart of world model enhancement for autonomous driving model

Deep generative model

Deep generative model Typically include variational autoencoders, generative adversarial networks, flow models, and autoregressive models.

The variational autoencoder combines the ideas of autoencoders and probabilistic graphical models to learn the underlying structure of the data and generate new samples
The generative adversarial network consists of two neural networks, It consists of a generator and a discriminator, which compete and enhance each other using adversarial training, and ultimately achieve the goal of generating real samples.
The flow model converts a simple prior distribution into a complex posterior distribution through a series of reversible transformations. Generate similar data samples
Autoregressive model is a type of sequence analysis method that describes the relationship between current observations and past observations based on the autocorrelation between sequence data. The estimation of model parameters is usually using This is done using least squares and maximum likelihood estimation. The diffusion model is a typical autoregressive model that learns a stepwise denoising process from pure noise data. Due to its powerful generative performance, the diffusion model is a new SOTA model among current deep generative models

Generative methods

Based on the powerful capabilities of deep generative models , using deep generative models as world models to learn driving scenarios to enhance autonomous driving has gradually become a research hotspot. Next we review the use of deep generative models as world models in autonomous driving. Vision is one of the most direct and effective ways for humans to obtain information about the world, because image data contains extremely rich feature information. Many previous works have completed the task of image generation through world models, showing that world models have good understanding and reasoning capabilities for image data. Overall, researchers hope to learn the inherent evolutionary laws of the world from image data and then predict future states. Combined with self-supervised learning, the world model is used to learn from image data, fully releasing the model's reasoning capabilities and providing a feasible direction for building a generalized basic model in the visual domain. The figure below shows a summary of some related work using world models.

Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

Summary of work using world models for prediction

Non-generative methods

Compare to generative world models For example, LeCun elaborated on his different conceptions of world models by proposing the Joint Extraction and Prediction Architecture (JEPA). This is a non-generative and self-supervised architecture because it does not predict the output directly based on the input data, but encodes the input data in an abstract space to complete the final prediction. The advantage of this prediction method is that it does not require predicting all information about the output and can eliminate irrelevant details.

JEPA is a self-supervised learning architecture based on energy models, which observes and learns how the world works and highly generalized laws. JEPA also has great potential in autonomous driving and is expected to generate high-quality driving scenarios and driving strategies by learning how driving works.

Conclusion

This article provides a comprehensive overview of the important role of the basic model in autonomous driving applications. Judging from the summary and findings of the relevant research work surveyed in this article, another direction worthy of further exploration is how to design an effective network architecture for self-supervised learning. Self-supervised learning can effectively break through the limitations of data annotation, allowing the model to learn data on a large scale, and fully unleash the model's reasoning capabilities. If the basic model of autonomous driving can be trained using different scales of driving scene data under a self-supervised learning paradigm, its generalization ability is expected to be greatly improved. Such advances may enable a more general base model.

In short, although there are many challenges in applying the basic model to autonomous driving, it has a very broad application space and development prospects. In the future, we will continue to observe the progress of basic models applied to autonomous driving.

The above is the detailed content of Review! Comprehensively summarize the important role of basic models in promoting autonomous driving. For more information, please follow other related articles on the PHP Chinese website!

架构算法人工智能 transformer https 传感器 gpt Foundation

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：Can fine-tuning really allow LLM to learn new things: introducing new knowledge may make the model produce more hallucinationsNext article：Can fine-tuning really allow LLM to learn new things: introducing new knowledge may make the model produce more hallucinations

See more

Review! Comprehensively summarize the important role of basic models in promoting autonomous driving

写在前面&笔者的个人理解

自动驾驶模型

基于语言和视觉基础模型的类人驾驶

使用预训练主干网络进行端到端自动驾驶

Data enhancement

Extending autonomous driving data sets

Generating driving scenes

World Model

Deep generative model

Generative methods

Non-generative methods

Conclusion

Related articles