Home >Technology peripherals >AI >Review! Comprehensively summarize the important role of basic models in promoting autonomous driving
最近来,随着深度学习技术的发展和突破,大规模的基础模型(Foundation Models)在自然语言处理和计算机视觉领域取得了显著性的成果。基础模型在自动驾驶当中的应用也有很大的发展前景,可以提高对于场景的理解和推理。
本文主要概述了基础模型在自动驾驶领域中的应用,并根据基础模型在自动驾驶模型方面的应用、基础模型在数据增强方面的应用以及基础模型中世界模型对于自动驾驶方面的应用三方面进行展开。 在自动驾驶模型方面,基础模型可以用于实现各种自动驾驶功能,例如车辆的感知、决策和控制等。通过基础模型,车辆可以获取周围环境的信息,并做出相应的决策和控制动作。 在数据增强方面,基础模型可以用于增强数据
本文链接:https://arxiv.org/pdf/2405.02288
在自动驾驶中,语言和视觉的基础模型显示出了巨大的应用潜力,通过增强自动驾驶模型在驾驶场景中的理解和推理,实现自动驾驶的类人驾驶。下图展示了基于语言和视觉的基础模型对驾驶场景的理解以及给出语言引导指令和驾驶行为的推理。
基础模型对于自动驾驶模型增强范式
目前很多工作都已经证明语言和视觉特征可以有效增强模型对于驾驶场景的理解,再获取对于当前环境的整体感知理解后,基础模型就会给出一系列的语言命令,如:“前方有红灯,减速慢行”,“前方有十字路口,注意行人”等相关语言指令,便于自动驾驶汽车根据相关的语言指令执行最终的驾驶行为。
近年来,学术界和工业界将GPT的语言知识嵌入到自动驾驶的决策过程中。以语言命令的形式提高自动驾驶的性能,以促进大模型自动驾驶中的应用。考虑到大模型有望真正部署在车辆端,它最终需要落在规划或控制指令上,基础模型最终应该从动作状态级别授权自动驾驶。一些学者已经进行了初步探索,但仍有很多发展空间。更重要的是,一些学者通过类似GPT的方法探索了自动驾驶模型的构建,该方法直接输出基于大规模语言模型的轨迹,然后通过控制命令实现,相关工作已经汇总在如下表格中。
上述的相关内容其核心思路是提高自动驾驶决策的可解释性,增强场景理解解析,指导自动驾驶系统的规划或控制。在过去的一段时间内,有许多工作一直以各种方式优化预训练模型主干网络,并且取得了非常不错的成果。因此,为了更加全面的总结基础模型在自动驾驶中的应用,我们对预训练主干网络以及取得了非常不错的成果的研究进行了总结和回顾。下图展示了端到端自动驾驶的整体过程。
基于预训练主干网络的端到端自动驾驶系统的流程图
In the overall process of end-to-end autonomous driving, extracting low-level information from raw data determines the potential of subsequent model performance to a certain extent. Excellent pre-training backbone can make the model have stronger feature learning capabilities. Pre-trained convolutional networks such as ResNet and VGG are the most widely used backbone networks for end-to-end model visual feature extraction. These pre-trained networks are usually trained using object detection or segmentation as the task of extracting generalized features, and the performance they achieve has been verified in many works.
In addition, early end-to-end autonomous driving models were mainly based on various types of convolutional neural networks and were completed through imitation learning or reinforcement learning. Some recent work has attempted to build an end-to-end autonomous driving system with a Transformer network structure, and has also achieved relatively good results, such as Transfuser, FusionAD, UniAD and other works.
With the further development of deep learning technology and the further improvement and upgrade of the underlying network architecture, basic models with pre-training and fine-tuning have shown Increasingly powerful performance. The basic model represented by GPT has enabled the transformation of large models from the rules of the learning paradigm to a data-driven approach. The importance of data as a key link in model learning is irreplaceable. During the training and testing of autonomous driving models, a large amount of scene data is used to enable the model to have good understanding and decision-making capabilities for various road and traffic scenarios. The long-tail problem faced by autonomous driving is also the fact that there are endless unknown edge scenarios, which makes the model's generalization ability seem to never be enough, resulting in poor performance.
Data augmentation is crucial to improving the generalization ability of autonomous driving models. The implementation of data enhancement needs to consider two aspects
Therefore, related research work is mainly carried out from the above two aspects. Technical research: First, enrich the data content in existing data sets and enhance data characteristics in driving scenarios. The second is to generate multi-level driving scenarios through simulation.
Existing autonomous driving data sets are mainly obtained by recording sensor data and then labeling the data. The data features obtained in this way are usually very low-level, and the magnitude of the data set is also relatively poor, which is completely insufficient for the visual feature space of autonomous driving scenarios. The advanced semantic understanding, reasoning and interpretation capabilities of the basic model represented by the language model provide new ideas and technical approaches for the enrichment and expansion of autonomous driving data sets. Expanding the data set by leveraging the advanced understanding, reasoning, and interpretation capabilities of the underlying model can help better evaluate the explainability and control of autonomous driving systems, thereby improving the safety and reliability of autonomous driving systems.
Driving scenes are of great significance to autonomous driving. In order to obtain different driving scene data, relying only on the vehicle's sensors for real-time collection requires huge costs, and it is difficult to obtain enough scene data for some edge scenes. Generating realistic driving scenes through simulation has attracted the attention of many researchers. Traffic simulation research is mainly divided into two categories: rule-based and data-driven.
With the development of technology, the current way of generating data has gradually transformed from a rule-based approach A data-driven approach. By efficiently and accurately simulating driving scenarios, including various complex and dangerous situations, a large amount of training data is provided for model learning, which can effectively improve the generalization ability of the autonomous driving system. At the same time, the generated driving scenarios can also be used to evaluate different autonomous driving systems and algorithms to test and verify system performance. The following table is a summary of different data augmentation strategies.
Summary of different data enhancement strategies
A world model is considered an artificial intelligence model that contains an overall understanding or representation of the environment in which it operates. The model is able to simulate the environment to make predictions or decisions. In recent literature, the term "world model" is mentioned in the context of reinforcement learning. This concept is also gaining traction in autonomous driving applications because of its ability to understand and elucidate the dynamics of the driving environment. World models are highly related to reinforcement learning, imitation learning, and deep generative models. However, utilizing world models in reinforcement learning and imitation learning usually requires well-labeled data, and methods such as SEM2 and MILE are performed in the supervised paradigm. At the same time, there are also attempts to combine reinforcement learning and unsupervised learning based on the limitations of labeled data. Due to their close association with self-supervised learning, deep generative models have become increasingly popular and a lot of work has been proposed. The figure below shows the overall flow chart of using the world model to enhance the autonomous driving model.
Overall flow chart of world model enhancement for autonomous driving model
Deep generative model Typically include variational autoencoders, generative adversarial networks, flow models, and autoregressive models.
Based on the powerful capabilities of deep generative models , using deep generative models as world models to learn driving scenarios to enhance autonomous driving has gradually become a research hotspot. Next we review the use of deep generative models as world models in autonomous driving. Vision is one of the most direct and effective ways for humans to obtain information about the world, because image data contains extremely rich feature information. Many previous works have completed the task of image generation through world models, showing that world models have good understanding and reasoning capabilities for image data. Overall, researchers hope to learn the inherent evolutionary laws of the world from image data and then predict future states. Combined with self-supervised learning, the world model is used to learn from image data, fully releasing the model's reasoning capabilities and providing a feasible direction for building a generalized basic model in the visual domain. The figure below shows a summary of some related work using world models.
Summary of work using world models for prediction
Compare to generative world models For example, LeCun elaborated on his different conceptions of world models by proposing the Joint Extraction and Prediction Architecture (JEPA). This is a non-generative and self-supervised architecture because it does not predict the output directly based on the input data, but encodes the input data in an abstract space to complete the final prediction. The advantage of this prediction method is that it does not require predicting all information about the output and can eliminate irrelevant details.
JEPA is a self-supervised learning architecture based on energy models, which observes and learns how the world works and highly generalized laws. JEPA also has great potential in autonomous driving and is expected to generate high-quality driving scenarios and driving strategies by learning how driving works.
This article provides a comprehensive overview of the important role of the basic model in autonomous driving applications. Judging from the summary and findings of the relevant research work surveyed in this article, another direction worthy of further exploration is how to design an effective network architecture for self-supervised learning. Self-supervised learning can effectively break through the limitations of data annotation, allowing the model to learn data on a large scale, and fully unleash the model's reasoning capabilities. If the basic model of autonomous driving can be trained using different scales of driving scene data under a self-supervised learning paradigm, its generalization ability is expected to be greatly improved. Such advances may enable a more general base model.
In short, although there are many challenges in applying the basic model to autonomous driving, it has a very broad application space and development prospects. In the future, we will continue to observe the progress of basic models applied to autonomous driving.
The above is the detailed content of Review! Comprehensively summarize the important role of basic models in promoting autonomous driving. For more information, please follow other related articles on the PHP Chinese website!