Home >Technology peripherals >AI >Detailed explanation of autonomous driving decision planning technology
With the rapid development of deep reinforcement learning technology, more and more research teams have begun to apply it to autonomous driving decision planning, integrating behavioral decision-making with motion planning modules to directly learn driving trajectories.
The decision-making planning module in autonomous driving is one of the core indicators for measuring and evaluating autonomous driving capabilities. Its main task is to analyze the current environment after receiving various sensory information from sensors, and then Issue instructions to the underlying control module. A typical decision planning module can be divided into three levels: global path planning, behavioral decision-making, and motion planning.
In a complete autonomous driving system, if the perception module is compared to human eyes and ears, then the decision-making planning is the brain of autonomous driving. After receiving various sensory information from the sensor, the brain analyzes the current environment and then issues instructions to the underlying control module. This process is the main task of the decision-making and planning module. At the same time, how complex scenarios the decision-making planning module can handle is also one of the core indicators for measuring and evaluating autonomous driving capabilities [1].
Figure 1. The hierarchical structure of the decision-making planning module in the autonomous driving system, quoted from [2]
As shown in Figure 1, a typical decision-making The planning module can be divided into three levels.
Among them, after receiving a given driving destination, the global route planning (Route Planning) combines the map information to generate a global route as a reference for subsequent specific route planning;
After receiving the global path, the behavioral decision-making layer combines the environmental information obtained from the perception module (including other vehicles and pedestrians, obstacles, and traffic rule information on the road) to make specific behavioral decisions (such as Choose to change lanes to overtake or follow);
Finally, based on specific behavioral decisions, the Motion Planning layer plans to generate a path that satisfies specific constraints (such as the dynamic constraints of the vehicle itself, collision avoidance, and passenger comfort). properties, etc.), which is used as the input of the control module to determine the final driving path of the vehicle.
This article will introduce the main functions and common algorithms of each layer respectively, and compare the advantages and disadvantages of various algorithms and applicable scenarios.
Global path planning refers to selecting an optimal path through search after given the current position of the vehicle and the end target. The "optimal path" here is "Including conditions such as the shortest path or the fastest arrival time. This process is similar to the "navigation" function we often use in our lives. The difference is that the high-precision map used in autonomous driving is different from our common maps. The high-precision map contains more information about each lane. More information. Common global path planning algorithms include Dijkstra and A algorithms, as well as various improvements based on these two algorithms. Dijkstra's algorithm [3] and A* algorithm [4] are also the two most widely used search algorithms in many planning problems.
Figure 2. Global path planning diagram
1. Dijkstra algorithm
Dijkstra algorithm was developed by computer scientists Edsger W. Dijkstra proposed it in 1956 to find the shortest path between nodes in a graph. In Dijkstra's algorithm, the total movement cost of each node from the starting point needs to be calculated. At the same time, a priority queue structure is also needed. All nodes to be traversed will be sorted according to cost when placed in the priority queue. During the running of the algorithm, the node with the smallest cost is selected from the priority queue as the next traversed node each time. until you reach the end.
The advantage of Dijkstra's algorithm is that the path given is the optimal; the disadvantage is that the calculation time complexity is relatively high (O(N2)), because it explores the surroundings without a clear direction.
2. A* algorithm
In order to solve the search efficiency problem of Dijkstra algorithm, in 1968, the A algorithm was developed by Peter Hart, Nils Nilsson and Bertram Raphael of Stanford Research Institute. Published, its main improvement is to use a heuristic function to guide the search process. Specifically, Algorithm A calculates the priority of each node through the following function:
f(n)=g(n) h(n)
where:
After determining the global path, the autonomous vehicle needs to make appropriate decisions based on specific road conditions, traffic rules, other vehicles and pedestrians, etc. behavioral decisions.
This process faces three main problems:
First, the real driving scenes are ever-changing, how to cover them?
Secondly, the real driving scene is a multi-agent decision-making environment. The behavior of each participant, including the main vehicle, will have an impact on other participants in the environment, so we It is necessary to predict the behavior of other participants in the environment;
Finally, it is impossible for autonomous vehicles to perceive 100% of environmental information, for example, there are many possible dangerous situations that are blocked by obstacles.
Based on the above points, in the autonomous driving behavior decision-making layer, what we need to solve is the planning problem of perceptual uncertainty in the complex environment of multi-agent decision-making. It can be said that this problem is one of the core bottlenecks in truly realizing L4 and L5 level autonomous driving technology. In recent years, with the rapid development of deep reinforcement learning and other fields, new ideas and dawn have been brought to solve this problem.
The models of the behavioral decision-making layer are divided into four categories and introduced separately [5]:
1. Finite state machine model
Autonomous driving vehicle The initial decision-making model is the finite state machine model [6]. The vehicle selects appropriate driving behaviors according to the current environment, such as parking, changing lanes, overtaking, avoidance, slow driving, etc. The state machine model constructs a finite directed connected graph. To describe different driving states and the transition relationships between states, so as to reactively generate driving actions based on the transition of driving states.
The finite state machine model is the most widely used behavioral decision-making model in the field of autonomous driving because of its simplicity and ease of implementation. However, this type of model ignores the dynamics and uncertainty of the environment. In addition, when the driving scene characteristics In many cases, the division and management of states are cumbersome and are mostly suitable for simple scenarios. It is difficult to perform behavioral decision-making tasks in urban road environments with rich structural features.
2. Decision tree model
The decision/behavior tree model [7] is similar to the state machine model. It also selects different parameters reactively through the attribute values of the current driving state. Driving actions, but the difference is that this type of model solidifies the driving status and control logic into a tree structure, and searches for driving strategies through a top-down "polling" mechanism. This type of decision-making model has visual control logic, and the control nodes can be reused, but it needs to define the decision-making network offline for each driving scenario. When the state space and behavior space are large, the control logic will be more complex. In addition, this type of model is also unable to consider the uncertainty factors existing in the traffic environment.
3. Knowledge-based reasoning and decision-making model
The knowledge-based reasoning and decision-making model imitates the behavior of human drivers by the mapping relationship of "scene characteristics-driving actions" Decision-making process, this type of model stores driving knowledge in a knowledge base or neural network. The driving knowledge here is mainly represented by the mapping relationship between rules, cases or scene features to driving actions. Then, driving actions are inferred from the knowledge base or trained network structure through the "query" mechanism.
This type of model mainly includes: rule-based reasoning system [8], case-based reasoning system [9] and neural network-based mapping model [10].
This type of model relies heavily on prior driving knowledge and training data, and requires careful organization, management, and updating of driving knowledge. Although the mapping model based on neural networks can omit data annotation and knowledge integration process, but there are still the following shortcomings:
4. Value-based decision-making model
According to the maximum utility theory, the basic idea of the utility/value-based decision-making model is to select multiple criteria based on the selection criteria. Select the optimal driving strategy/action among alternatives [12].
In order to evaluate the quality of each driving action, this type of model defines a utility (utility) or value (value) function to quantitatively evaluate the degree to which the driving strategy meets the driving task objectives based on certain criterion attributes. For unmanned For driving tasks, these criterion attributes can be safety, comfort, driving efficiency, etc. Utility and value can be determined by a single attribute or multiple attributes.
Furda and Vlacic of Griffith University in Australia proposed a multi-criteria decision-making method to select the optimal driving action from the candidate action set [13]; Bandyopadhyay et al. of the National University of Singapore proposed a behavioral decision-making based on POMDP model [14] to solve situations where there is perceived uncertainty; Wei J and others from Carnegie Mellon University proposed a behavioral decision-making model [15] based on PCB (Prediction and-Cost-function Based), which focuses on It lies in how to construct an appropriate cost function to guide the prediction of the environment; in order to solve the decision-making problem in a complex environment involving multiple agents, many models based on game theory are also used by researchers to reason about the interaction between vehicles [16 ], [17]; in addition, due to its advantages in feature extraction, deep reinforcement learning technology has also begun to be widely used to complete the generation of optimal driving actions [18].
After determining the specific driving behavior, what we need to do is to transform the "behavior" into a more specific driving "trajectory" so that we can finally Generate a series of specific control signals for the vehicle to achieve the vehicle's driving according to the planned target. This process is called motion planning. The concept of motion planning has a long history of research in the field of robotics. From a mathematical perspective, we can regard it as an optimization problem as follows:
Path Planning
#Figure 3. Definition of path planning
In many scenarios represented by robots, We can think of our surroundings as certain. In this case, the so-called path planning refers to finding a mapping σ:[0,1]➞Χ that satisfies certain constraints in a given state space Χ. These constraints include:
The objective functional of this optimization problem is defined as J(σ), and its specific meaning can be expressed as path length, control complexity and other measures.
However, in the automatic driving problem, the environment around the vehicle is constantly changing dynamically, so simple path planning cannot give a solution that is always valid during the driving process, so we need to add a dimension - time T , the corresponding planning problem is usually called trajectory planning.
Trajectory Planning
Figure 4. Definition of Trajectory Planning
Increase of time dimension poses a huge challenge to planning issues. For example, for a robot moving in a 2D environment, which is abstracted as a single point, the obstacles in the environment are approximated as polygons. The path planning problem can be solved in polynomial time, while the trajectory planning problem that adds the time dimension has been proven to be an NP-hard problem.
In the actual scenario of autonomous driving, whether it is the vehicle itself or the surrounding environment, establishing a more accurate model means more complex constraints on the optimization problem, and it also means that the solution is more difficult. Therefore, the algorithms actually used are based on the approximation of actual scenarios, and seek an optimal balance between model accuracy and solution efficiency.
The following introduces several common types of motion planning algorithms in the field of autonomous driving. In practice, it is often the combination of several types of ideas that can ultimately achieve better planning results and satisfy more different needs. Scenes.
1. Search-based planning algorithm
Solving motion planning problems through search is one of the simplest ideas. The basic idea is to pass the state space through a determined The method is discretized into a graph, and then various heuristic search algorithms are used to search for feasible solutions or even optimal solutions.
In the process of discretizing the state space, attention should be paid to ensuring that the final grid has the largest coverage area and does not repeat. As shown in Figure 5, the grid on the left is generated by three behaviors: go straight, turn left 90°, and turn right 90°; and if you choose three behaviors: go straight, turn left 89°, and turn right 89°, you will not be able to Generate a grid structure covering the entire area.
Figure 5. Constructing a raster graph, quoted from [2]
After rasterizing the state space, we can use Dijkstra, which has been introduced previously. A* search algorithm to complete the final planning. However, in actual complex environments, there are many grids and the environment changes dynamically over time, which will lead to too many search nodes. Therefore, a variety of improved algorithms have been developed to deal with different specific scenarios:
1) Hybrid A* algorithm, based on the A* algorithm, considers the maximum steering problem of the vehicle. For example, the maximum steering direction of the vehicle on the calculated path is limited to no more than 5°. The current application scenarios of this algorithm include car U-turns (the Junior car used by Stanford to participate in the DARPA challenge used this algorithm to perform uturn), parking and other scenarios that require high steering wheel control.
2) D* and D*Lite algorithms search from the end point to the starting point in advance, using the Dijkstra algorithm to store the shortest path length k from the target point to each point in the road network, and the shortest path length k from the node to the target point. The actual length value h, in the initial case k==h, and the previous node of each node is stored to ensure that the link can be followed.
After the calculation is completed, an optimal path at that time is obtained. When the car travels to a certain node and finds that the node is impassable (there are obstacles) through the sensor, the h value of some relevant points in the stored road network information is modified (increased), and a neighbor point is selected that still satisfies the problem. h==k, that is, the point on the optimal path is still the next point.
Then walk to the end. This type of algorithm is suitable for navigation and path planning in unknown environments, and is widely used in various current mobile robots and autonomous vehicles, such as the "Opportunity" and "Spirit" Mars rovers.
2. Sampling-based planning algorithm
By sampling the continuous state space, the original problem is approximated into a discrete sequence optimization problem. This idea It is also the most widely used algorithm in computer science. In motion planning problems, basic sampling-based algorithms include Probabilistic Roadmap (PRM) and Rapid Search Random Tree (RRT) algorithms.
Figure 6. U-shaped curve trajectory planning using RRT algorithm, quoted from [19]
1) Basic algorithm: probabilistic route Figure (PRM)
It is easy to see that once a PRM is constructed, it can be used to solve motion planning problems in different initial and target states, but this feature is unnecessary for autonomous driving motion planning. In addition, PRM requires precise connections between states, which is very difficult for motion planning problems with complex differential constraints.
2) Basic algorithm: Rapid Search Random Tree (RRT)
RRT is an incremental sampling search method that does not require setting any resolution parameters. In the extreme case, the search tree will densely cover the entire space. At this time, the search tree is composed of many shorter curves or paths to achieve the purpose of filling the entire space.
3) Various improved algorithms
From the description of the above basic algorithm, we can understand that sampling the state space can ensure that the starting point and the end point are connected. Feasible solution, but because the sampling process uniformly samples the entire space, the efficiency is very low; real-time solution cannot be achieved in complex scenarios; in addition, the final planning result cannot guarantee that the feasible solution obtained is the optimal solution. In response to these disadvantages, a variety of improved algorithms have been proposed and applied to autonomous driving problems:
- RRT-Connect: build two Two trees start from the initial state and the target state respectively. When the two trees grow together, a feasible solution is found.
- Heuristic (hRRT): Use a heuristic function to increase the probability that nodes with low expansion costs are sampled.
- Combined with the driver model: Combined with the driver visual attention model for biased sampling, using visual feature information to guide motion planning, so that the planned trajectory is more consistent with human driving behavior.
- Construct a new metric RG-RRT (reachability guided RT): the conventional Euclidean distance metric cannot truly reflect the distance between configurations or states. RG-RRT calculates the reachability set of nodes in the tree , when the distance from the sampling point to the node is greater than the distance from the sampling point to the reachable set of the node, the node may be selected for expansion.
- Add obstacle penalty (RC-RRT, EG-RRT, ADD-RRT, etc.): Reduce the probability that nodes close to obstacles will be expanded.
- anytime RRT first quickly builds an RRT, obtains a feasible solution and records its cost. After that, the algorithm will continue sampling, but it will only benefit Nodes that reduce the cost of feasible solutions are inserted into the tree, thereby gradually obtaining better feasible solutions.
- Replanning decomposes the entire planning task into a number of equal-time sub-task sequences, and plans the next task while executing the current task.
- PRM*, RRG, RRT*: According to the random geometric graph theory (randomly sample m points in the state space, and divide the distance Points smaller than r(n) are connected to form a random geometric graph) The standard PRM and RRT are improved, and the PRM*, RRG and RRT* algorithms with asymptotic optimal properties are obtained
3. Direct optimization method
In most cases, regardless of height changes, the trajectory planning problem of autonomous driving is a three-dimensional constrained optimization problem (2D space time T). Therefore, We can use a decoupling strategy to decompose the original problem into several low-dimensional problems, thereby greatly reducing the difficulty of solving it.
1) Frenet coordinate system
Figure 7. Frenet coordinate system
Due to the real world The roads in are all curved. In order to simplify the parameter expression of solving optimization problems, the Frenet coordinate system is usually used in autonomous driving.
In the Frenet coordinate system, we use the center line of the road as the reference line, and use the tangent vector t and normal vector n of the reference line to establish a coordinate system, as shown in the figure on the right, which takes the vehicle itself as The origin, the coordinate axes are perpendicular to each other, divided into the s direction (that is, the direction along the reference line, often called longitudinal, Longitudinal) and the d direction (or L direction, that is, the current normal direction of the reference line, called horizontal, Lateral ), compared to the Cartesian coordinate system (left picture), the Frenet coordinate system significantly simplifies the problem.
Because when driving on the road, we can always easily find the reference line of the road (i.e., the center line of the road), then the position based on the reference line can simply be expressed using the longitudinal distance S (i.e., along the It is described by the distance along the road direction) and the lateral distance L (that is, the distance away from the reference line).
2) Path-speed decoupling method
#In the Frenet coordinate system, the path-speed decoupling method optimizes the path and speed respectively. Path optimization mainly Considering static obstacles, a static reference path (SL dimension) is generated through dynamic planning, and then based on the generated path, speed planning (ST dimension) is considered. This process can be iterated continuously, enabling real-time updates to the trajectory. The EM planner used in Baidu's open source autonomous driving platform Apollo is based on a similar solution. This solution has strong flexibility and can be universally applied to many scenarios.
In addition, you can also choose different decoupling methods, such as planning the longitudinal trajectory (ST dimension) and transverse trajectory (LT dimension) separately. However, it should be pointed out that the solution obtained through the decoupling method may not be optimal, and this algorithm is not complete, and a feasible solution may not be found in some complex environments.
4. Parametric curve construction method
Figure 8. Common parametric curve construction method, quoted from [19]
The starting point of the parametric curve construction method is the constraints of the vehicle itself, including kinematics and dynamics constraints. Therefore, the generally planned path needs to have continuous curvature. This type of method considers obstacles based on the starting point and target point, and gives a smooth path by constructing a family of curves that conform to vehicle constraints.
As shown in Figure 8, common curves include Dubins curve (consisting of straight lines and arcs, which is the shortest curve family of a simple vehicle model Dubin model in two-dimensional space), clothoid curve, and polynomial curve , Bezier curve, spline curve, etc. It is difficult to simply apply the parametric curve construction method to meet actual complex scenarios. Therefore, more and more autonomous driving systems now combine it with other methods to smooth the planned and generated trajectories to satisfy vehicle kinematics and dynamics. Learn constraints.
5. Artificial potential field method
The artificial potential field method is inspired by the electromagnetic field in physics. It is assumed that obstacles and target positions generate repulsion and gravity respectively, so that the path can be planned along the fastest gradient descent of the potential field. A key issue in this type of method is how to select an appropriate potential field function. For example, Stephen Waydo uses flow functions for smooth path planning [20], and Robert Daily proposes a harmonic potential field path planning method on high-speed vehicles [21]. In simple scenarios, the artificial potential field method has high solution efficiency, but its biggest problem is that it may fall into a local minimum. In this case, the path obtained is not optimal, and the path may not even be found.
In planning problems, in addition to considering its time and space complexity, the evaluation of an algorithm must also consider whether it has completeness and Optimality, take a step back and consider whether it has probabilistic completeness and asymptotic optimality. Only on the basis of understanding these properties can we design and apply different algorithms for different actual scenarios, so as to achieve the best balance between model complexity and optimal efficiency.
1) Completeness: If there is a path solution between the starting point and the target point, then the solution must be obtained. If the solution cannot be obtained, it must mean that there is no solution;
2) Probabilistically Completeness: If there is a path solution between the starting point and the target point, as long as the planning or search time is long enough, you will be sure to find a path solution;
3) Optimality: The planned path is optimal on a certain evaluation index (the evaluation index is generally the length of the path)
4) Asymptotically optimality: after a finite number of The path obtained after planning iterations is a suboptimal path that is close to the optimal, and it is closer to the optimal path after each iteration. It is a gradual convergence process
Table 1 Comparison of common algorithms
Figure 9. Autonomous driving development timeline and important motion planning algorithms in the process, quoted from [19 ]
Humanity’s interest in autonomous driving can be traced back to 1925. In recent years, the research boom on autonomous driving began with the U.S. Defense Advanced Research Projects Agency (DARPA) in 2004-2007. The 3rd Autonomous Driving Challenge held [22] is shown in Figure 9. After that, the effectiveness of the various decision-making planning methods mentioned above were actually verified. At the same time, solutions that combine motion planning methods with control theory, state parameter estimation, machine learning and other multi-field methods continue to emerge, becoming a future development trend:
1) Combined with vehicle dynamics: Combining dynamic parameter evaluation indicators with optimal planning, planning from the perspective of optimal control is a method often used in recent years. In this process, vehicle dynamics factors can be fully considered, and the planned trajectory is more reasonable. For example, use model predictive control theory (Model Predictive Control). The disadvantage is that the more constraints there are on the vehicle, the more difficult it is to optimize its trajectory, and it is more difficult to achieve online real-time calculation.
2) Combined with state parameter estimation: state parameter estimation can obtain vehicle parameters more accurately, so the state estimator can be added to the planning module to improve the trajectory by estimating the vehicle state online and feeding it back to the planner. quality. For example, different ground types will cause changes in vehicle slip characteristics, thereby affecting the vehicle status. By combining estimated parameters, the trajectory can be re-planned in real time and closed-loop planning can improve trajectory safety.
3) Combining with machine learning: With the rapid development of artificial intelligence represented by neural networks, many traditional planning problems have also brought new solutions. In the field of autonomous driving, its development trends include:
The above is the detailed content of Detailed explanation of autonomous driving decision planning technology. For more information, please follow other related articles on the PHP Chinese website!