Home  >  Article  >  Technology peripherals  >  The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

WBOY
WBOYOriginal
2024-06-08 16:57:521109browse

Written before&The author’s personal understanding

In order to safely navigate in complex real-world scenarios, autonomous vehicles must be able to adapt to various road conditions and predict future events. Reinforcement learning (RL) based on world models has emerged as a promising approach to achieve this by learning and predicting the complex dynamics of various environments. However, an accessible platform for training and testing such algorithms in complex driving environments does not currently exist. To fill this gap, CarDreamer is introduced here, the first open source learning platform designed specifically for developing and evaluating self-driving algorithms based on world models. It contains three key components:

1%) World Model (WM) Backbone: CarDreamer integrates some of the most advanced world models to simplify the reproduction of RL algorithms. The backbone is decoupled from other parts and communicates using standard Gym interfaces so that users can easily integrate and test their own algorithms. The main goal of CarDreamer is to provide a flexible and scalable platform that enables researchers and developers to quickly iterate and test various reinforcement learning algorithms. The platform is based on the core idea of ​​WM, which divides the world model into two main components: perception and planning. The perception component is responsible for taking raw input from the environment and converting it into a set of highly configurable driving tasks (2%) Built-in tasks: CarDreamer provides a highly configurable set of driving tasks tasks, which are compatible with the Gym interface and come with empirically optimized reward functions.

CarDreamer is a flexible mission development kit to simplify the creation of driving missions. The suite makes it easy to define traffic flows and vehicle routes and automatically collects simulation data. The visualization server allows users to track real-time agent driving video and performance metrics through a browser. In addition, CarDreamer conducted richness and flexibility studies to evaluate the performance and potential of WM in autonomous driving. Due to CarDreamer's rich functionality and flexibility, the impact of observation modes, observability, and vehicle intent sharing on AV safety and efficiency is also systematically studied.

Field development background

Future mobility systems will play a core role in autonomous vehicles, with many promising benefits such as safety and efficiency. In recent years, the development of autonomous vehicles has made great achievements. In the United States alone, self-driving cars have already logged millions of miles on public roads. However, achieving robust autonomous vehicles capable of navigating complex and diverse real-world scenarios remains a challenging frontier. According to calculations by the U.S. Department of Transportation’s Federal Highway Administration, self-driving cars have a crash rate about twice as high as conventional vehicles. Still, as technology continues to advance, collision rates for self-driving cars are expected to improve significantly. In order to achieve higher safety, autonomous vehicles need to have more advanced perception and decision-making capabilities. By leveraging advanced sensor technology and machine learning algorithms, autonomous vehicles can more accurately identify and predict the behavior of obstacles and other vehicles in their surrounding environment. In addition, autonomous vehicles can improve the efficiency of traffic flow by coordinating with transportation authorities. Through interconnection with traffic lights and other traffic facilities, autonomous vehicles can adjust speed and route in real time, thereby reducing traffic

The reliability of autonomous vehicles directly determines the generalization of the autonomous driving system in unpredictable scenarios ability. World Models (WM), with their excellent generalization capabilities, provide a promising solution by learning the complex dynamics of the environment and predicting future scenarios. In particular, WM learns a compact latent and dynamically encodes the key elements and dynamics of the environment. This learned representation contributes to better generalization, enabling WM to predict in scenarios beyond its training samples. Internally, WM contains components that mimic human perception and decision-making, such as visual models and memory models. In fact, the reason why humans are able to take appropriate actions when encountering unknown or unseen events is precisely because of humans’ internal model of the world. By simulating cognitive processes similar to human intelligence, WM-based reinforcement learning (RL) algorithms have demonstrated state-of-the-art performance in areas such as Atari games and Minecraft. However, the application of WM in autonomous driving remains an exciting area, partly due to the lack of easy-to-use platforms to train and test such RL algorithms. The development of an autonomous driving learning platform based on WM will be extremely beneficial to research in this field.

So, driven by these factors, we launched CarDreamer. This is the first open source learning platform specifically designed for WM-based autonomous driving. CarDreamer facilitates the rapid development and evaluation of algorithms, enabling users to test their algorithms on provided tasks or quickly implement custom tasks through a comprehensive development suite. CarDreamer’s three key contributions include: 1. Rapid development and evaluation: CarDreamer provides a set of powerful tools for rapid algorithm development and evaluation. Users can leverage these tools to test their algorithms, conduct autonomous driving experiments on provided tasks, and conduct performance evaluations. 2. Custom tasks: CarDreamer provides a comprehensive development kit to enable users to quickly implement custom tasks. This enables users to develop their own self-driving algorithms based on specific needs and test and validate them on the CarDreamer platform. 3. Key contributions include: CarDreamer also provides

  1. integrated WM algorithm to achieve reproduction. CarDreamer integrates the most advanced WM, including DreamerV2, DreamerV3 and Planning2Explore, significantly reducing the time required to reproduce the performance of existing algorithms. These algorithms are decoupled from the rest of CarDreamer and communicate through a unified Gym interface. This allows new algorithms to be directly integrated and tested without additional adaptation work as long as they support the Gym interface.
  2. Highly configurable built-in tasks for optimized rewards. CarDreamer offers a comprehensive set of driving tasks such as lane changing and overtaking. These tasks allow for extensive customization in terms of difficulty, observability, observation modes, and vehicle intent communication. They expose the same Gym interface for ease of use, and the reward functions are carefully designed to optimize training efficiency.
  3. Task Development Kit and Visualization Server. The kit not only simplifies the creation of custom driving tasks with API-driven traffic generation and control, but also includes a modular observer to facilitate multi-modal data collection and configuration. A visualization server enables real-time display of agent driving video and statistics, accelerating reward engineering and algorithm development through a web browser, providing instant performance insights.

Introduction to other frameworks

Here is a brief introduction to the two cornerstones of CarDreamer, CARLA and gym. CARLA is a high-fidelity and flexible simulator and standard interface for training and evaluation of reinforcement learning. Gym is an open source toolkit for reinforcement learning, providing a rich environment and algorithms. CarDreamer uses these two cornerstones for training and evaluation, and uses RL (reinforcement learning) as an interface for model training and evaluation.

CARLA is an open source simulator designed to simulate real-world traffic scenarios. CARLA is based on Unreal Engine, which provides realistic physical effects and high-quality rendering. CARLA provides digital assets including maps, buildings, vehicles and various landmarks. It supports various sensors such as RGB cameras, LiDAR and RADAR. Users can create vehicles or pedestrians and have full control over these characters. This is indeed a very versatile tool, but its main shortcoming in the application of RL algorithms also stems from its generality. Obtaining BEV (bird's eye view) extraction and a tedious process hinders its rapid deployment in training RL algorithms.

gym is a standard interface defined by OpenAI to standardize the communication between the agent and the environment. The core part of this interface consists of two functions reset() and step(action). The former initializes the environment to its starting state. The latter receives an action input from the agent, simulates the evolution of the environment, and returns observation data, reward signals, termination indicators and some additional information. In this way, RL algorithms can be easily tested in a variety of environments without extensive tuning, as long as both support the gym interface. There have been many efforts to develop various gym benchmarks such as Atari games, DMC suites, etc. In CARLA, the WM-based RL algorithm is used in the field of autonomous driving, and CarDreamer is a platform that provides diverse urban driving tasks through the gym interface to facilitate training and evaluation.

CarDreamer network structure

As shown in Figure 1, CarDreamer contains three main components: built-in tasks, task development kits and world model backbone. The mission development kit provides a variety of API functions, including creating vehicles, controlling traffic flow and planning routes in CARLA. An observation module automatically collects multi-modal observation data such as sensor data and BEV (bird's eye view), which are managed by independent and customizable data processing procedures. These data serve a dual purpose: they are utilized by task and training visualization servers. The visualization server displays real-time driving video and environmental feedback through an HTTP server, and is seamlessly integrated with the world model algorithm through the gym interface. After receiving the agent's response as an action, the observation module collects data from the data handler in the next frame, thus continuing this cycle of operations.

The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

A variety of realistic tasks are carefully designed here, ranging from simple skills (such as lane keeping and left turns) to more complex challenges (such as random roaming in different road conditions, including intersections, roundabouts, and different traffic flows). These tasks are highly configurable, providing many options that pose fundamental questions in autonomous driving.

Observability and Intent Sharing: In reinforcement learning, partial observability is a significant challenge, where incomplete state information can be exponentially increased by including all historical steps The complexity of the input space. To address the lack of tools in autonomous driving tailored to these challenges, we provide three observability settings in CarDreamer: 1) Field of View (FOV) includes only vehicles within the camera's field of view. 2) Shared Field of View (SFOV) enables vehicles to communicate with other vehicles within their own field of view and collect FOV data. 3) Full observability (FULL) assumes complete environment and background traffic information. Additionally, users can control whether the vehicle shares its intentions and with whom it is shared. These configurations are consistent with the fundamental questions of "what to communicate" and "to whom to communicate". Observation Mode: Users can configure the observation space to include a variety of modes, from sensor data such as RGB cameras and LiDAR to synthetic data such as BEV. This flexibility supports the development of end-to-end models capable of making decisions directly from multi-modal raw sensor data or using BEV sensing for planning. Difficulty : The difficulty setting primarily affects traffic density, posing significant collision avoidance challenges. Since safety-critical events for autonomous vehicles are rare, verifying the robustness of autonomous vehicles is inherently difficult due to the rarity of such events. CarDreamer is specifically designed to comprehensively evaluate safety and efficiency in scenarios that simulate these rare but critical events.

Reward function. Each task in CarDreamer is equipped with an optimized reward function, and experiments show that this can enable DreamerV3 to successfully navigate to landmark points in only 10,000 training steps (see Section 5 for details). Notably, our empirical findings show that rewarding agents based on velocity or incremental position changes leads to better performance than rewarding based on absolute position. This is because when rewards are based solely on position, the agent may exploit the reward function by making a small initial move and then remaining stationary, as any further movement may result in a collision penalty. In practice, we do observe this suboptimal behavior, where the learned policy converges to a local optimal solution, avoiding collisions by remaining stationary. In contrast, basing rewards on speed forces the agent to maintain continuous motion to accumulate rewards, thereby reducing the risk of premature convergence to an undesirable stationary policy. The reward design carefully considers key requirements of the driving task, such as trajectory smoothness, which are often ignored in traditional reinforcement learning algorithms. Typically, these algorithms include an entropy term in their loss function or value estimate to encourage exploration and prevent premature convergence. However, in the context of autonomous driving, this entropy term may incentivize the vehicle to follow a zigzag trajectory, since such erratic motion yields a higher entropy reward compared to a smoother path, even if both trajectories achieve Similar progress is possible on the goals. To counteract this effect, a specially designed penalty term is introduced here to block motion perpendicular to the target direction. Therefore, we developed a reward function that effectively balances goal progress and trajectory smoothness, with the following structure:

The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

Interface and Usage: CarDreamer All built-in tasks use a unified gym interface, allowing the reinforcement learning algorithm to be trained and tested directly without additional adjustments. In addition to out-of-the-box use, CarDreamer supports a variety of algorithms, including course learning algorithms that exploit stepwise progression from simple to complex tasks; and continuous learning algorithms, which are designed to address the problem of catastrophic forgetting when learning new tasks. In addition, for imitation learning, CarDreamer simplifies the process of collecting observation data in the simulator. Although originally designed for WM-based reinforcement learning algorithms, the gym interface enables its widespread use in a variety of algorithmic strategies.

1) Mission Development Kit

For users who need to customize missions, CarDreamer provides a highly modular mission development kit. This kit can meet the diverse requirements of users with different levels of customization needs. The initial module is the "World Manager", which meets basic needs such as changing the driving scene through different maps, routes, spawn locations or background traffic flows. The world manager is responsible for managing "actors," a term borrowed from CARLA that includes all entities such as vehicles, pedestrians, traffic lights, and sensors. It provides API calls to spawn various actors, specifically vehicles in different locations with default or custom blueprints. These vehicles can be controlled by users or by Autopilot, a self-driving algorithm based on simple rules. On reset, it transparently destroys and releases the resource. The second module is "Observer", which automatically collects observation data in various modes. While it allows users to easily access predefined observation patterns without manual interaction, it also supports extensive customization of data specifications. This is achieved through a series of data processors, each providing data for a specific mode, such as RGB camera processors and BEV processors. Each data processor is highly modular and independently manages the entire lifecycle of a specific type of data. Users can enhance the observer by registering a new data processor that suits their needs.

The third module contains route planners that can meet diverse mission route needs. CarDreamer includes several planners: a stochastic planner for exploratory roaming across the map; a fixed path planner for creating waypoints connecting user-defined locations; and a fixed endpoint planner that uses The classic A* algorithm generates a route from the current location to the specified end point. To meet additional customization needs, a base class is also provided, and users can develop their own planners by overriding the init_route() and extend_route() methods (which define route initialization and extension for each time step, respectively). Additionally, the suite includes a visualization server that seamlessly integrates the Observer's output and other statistics fed back from the environment and displays them via an HTTP server. This automation facilitates rapid feedback to improve the reward engineering and algorithm development process without additional coding effort.

2) World Model Backbone

The world model backbone framework in CarDreamer seamlessly integrates the most advanced methods including DreamerV2, DreamerV3 and Planning2Explore, etc., thereby promoting for rapid reproduction of these models. This backbone architecture is carefully designed to decouple the world model implementation from task-specific components, thereby increasing modularity and scalability. Communication between these components is efficiently managed through standard gym interfaces, allowing extensive customization. This decoupling enables users to easily replace the default world model with their own implementation, enabling rapid prototyping, benchmarking, and comparative analysis against established benchmarks. CarDreamer therefore provides a comprehensive testing platform for world model-based algorithms, fostering an ecosystem for accelerated research and development in this field. The platform encourages users to explore innovative architectures, loss functions and training strategies within a consistent and standardized evaluation framework consisting of diverse driving tasks and performance metrics.

CarDreamer Task Experiment

A small DreamerV3 model (shown in Figure 4) containing only 18 million parameters is used as the backbone of the model. This small DreamerV3 model has 32 CNN multipliers, 512 GRU and MLP units, and the MLP only has two layers in its RSSM. The small memory overhead is about 10GB, which allows us to train on a single NVIDIA 4090 GPU while running the CARLA simulator. Train the agent on each task.

The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

The reward curve changes with time steps as shown in Figure 2.

The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

Simple tasks with low traffic volume, such as "Simple Right Turn" and "Lane Merge", typically converge in 50,000 steps (about 1 hour), while those involving more Dense, more aggressive traffic flows, and tasks that require collision avoidance, require approximately 150,000 to 200,000 steps (approximately 3 to 4 hours) to converge. In the evaluation, we adopted several metrics to rigorously evaluate the performance of the autonomous driving agent executing in the CarDreamer task, as detailed in Table 1. These metrics include:

• Success rate: This metric measures the percentage of an agent vehicle successfully completing its mission (reaching a destination or traveling a predetermined distance without an accident or deviating from its lane).

• Average distance (meters): Represents the average distance traveled by the agent vehicle across all episodes before the episode ends (either by completing the task or due to failure, such as a collision or timeout).

• Collision rate (%): Calculates the percentage of plots in which the agent's vehicle collides.

• Average speed (m/s): Measures the average speed maintained by the agent vehicle throughout the mission. This metric reflects a vehicle's ability to balance speed and safety, indicating how efficiently it can navigate its environment.

• Waypoint Distance: This metric quantifies the average deviation from the desired route waypoints. It evaluates a vehicle's ability to follow a planned path, reflecting its navigation accuracy and precision when following a given trajectory.

The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

1) Prediction under different observation modes

The imagination ability of the world model (WM) enables it to predict effectively future scenarios and manage potential incidents. In order to evaluate the imagination performance of WM under different observation modalities, we conducted experiments on the "right turn difficulty" task. Three different modalities were selected: bird's eye view (BEV), camera, and lidar (LiDAR). For each modality, WM needs to imagine the observation results of several steps in the future under a given starting state and a series of actions. Figure 4 shows the results, comparing the differences between real images and imagined images in the three modalities. The first row shows the real observation image, the second row is the result of WM imagination, and the third row is the difference between them. We selected frames that were at most 64 time steps within the imaginary range. These findings suggest that, despite the different modalities, WM still excels at accurately predicting the future. In the BEV experiment (a), WM accurately predicted the positions and trajectories of straight-going and right-turning vehicles, as well as the rotation and translation of the BEV relative to the ego vehicle. Similarly, in a camera and LiDAR setting, WM also successfully predicted vehicles traveling in front of the self-vehicle.

The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

#2) Benefits of Car-to-Car Communication

One of the unique features of CarDreamer is its ability to easily customize communication between vehicles. Communication level. Vehicles can share field of view (FOV) views, allowing for different observability. Additionally, they can even share intents (represented by the vehicle’s planned waypoints) for better planning. We exploit this feature to evaluate the impact of communication. On one agent, it was trained and tested under different settings of the "right turn difficult" task, i.e. different observability and whether the intent of other vehicles can be accessed. The Hard Right Turn task is particularly suitable for testing observability and intent communication due to dense traffic and frequent potential collisions from vehicles outside the field of view. The reward curve is shown in Figure 5, and some performance indicators are shown in Table 2. Note that in our reward function, successfully executing a right turn is roughly represented by a reward of more than 250. The results show that limited observability or lack of intent sharing prevents agents from completing tasks. The uniformly sampled images from one plot in Figure 6 provide a good explanation: the agent adopts a conservative and suboptimal strategy, it stops at intersections to avoid collisions. For example, in the first three rows of Figure 6, the agent stops moving before merging into traffic. In contrast, complete information enables the ego vehicle to successfully perform a right turn.

The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing

The above is the detailed content of The latest from the University of California! CarDreamer: A comprehensive and flexible open source platform for autonomous driving algorithm testing. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn