Home > Article > Technology peripherals > Berkeley open sourced the first high-definition data set and prediction model in parking scenarios, supporting target recognition and trajectory prediction.
As autonomous driving technology continues to iterate, vehicle behavior and trajectory prediction are of extremely important significance for efficient and safe driving. Although traditional trajectory prediction methods such as dynamic model deduction and accessibility analysis have the advantages of clear form and strong interpretability, their modeling capabilities for the interaction between the environment and objects are relatively limited in complex traffic environments. Therefore, in recent years, a large number of research and applications have been based on various deep learning methods (such as LSTM, CNN, Transformer, GNN, etc.), and various data sets such as BDD100K, nuScenes, Stanford Drone, ETH/UCY, INTERACTION, ApolloScape, etc. have also emerged. , which provides strong support for training and evaluating deep neural network models. Many SOTA models such as GroupNet, Trajectron, MultiPath, etc. have shown good performance.
The above models and data sets are concentrated in normal road driving scenarios, and make full use of infrastructure and features such as lane lines and traffic lights to assist in the prediction process; due to limitations of traffic regulations, The movement patterns of most vehicles are also relatively clear. However, in the "last mile" of autonomous driving - autonomous parking scenarios, we will face many new difficulties:
In the just concluded 25th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2022) in October 2022, from University of California, Berkeley Researchers released the first high-definition video & trajectory data set for parking scenes, and based on this data set, used CNN and Transformer architecture to propose a trajectory prediction model called "ParkPredict" .
The data set was collected by drone, with a total duration of 3.5 hours, video resolution For 4K, the sampling rate is 25Hz. The view covers a car park area of approximately 140m x 80m, with a total of approximately 400 parking spaces. The dataset is accurately annotated, and a total of 1216 motor vehicles, 3904 bicycles, and 3904 pedestrian trajectories were collected.
After reprocessing, the trajectory data can be read in the form of JSON and loaded into the data structure of the connection graph (Graph):
provided by the data set Two download formats:
JSON only (recommended) : JSON file contains the types, shapes of all individuals , trajectories and other information can be directly read, previewed, and generated semantic images (Semantic Images) through the open source Python API. If the research goal is only trajectory and behavior prediction, the JSON format can meet all needs.
##Original video and annotation: If the research is based on the original camera For topics in the field of machine vision such as target detection, separation, and tracking of raw images, you may need to download the original video and annotation. If this is required, the research needs need to be clearly described in the dataset application. In addition, the annotation file needs to be parsed by itself.
Behavior and trajectory prediction model: ParkPredictAs an application example, in the paper "ParkPredict: Multimodal Intent and Motion Prediction for Vehicles in Parking Lots with CNN" at IEEE ITSC 2022 and Transformer", the research team used this data set to predict the vehicle's intent (Intent) and trajectory (Trajectory) in the parking lot scene based on the CNN and Transformer architecture.
The team used the CNN model to predict the distribution probability of vehicle intent (Intent) by constructing semantic images. This model only needs to construct local environmental information of the vehicle, and can continuously change the number of available intentions according to the current environment.
The team improved the Transformer model and provided the intent (Intent) prediction results, the vehicle's movement history, and the semantic map of the surrounding environment as inputs to achieve Multi-modal intention and behavior prediction.
The above is the detailed content of Berkeley open sourced the first high-definition data set and prediction model in parking scenarios, supporting target recognition and trajectory prediction.. For more information, please follow other related articles on the PHP Chinese website!