Home >Technology peripherals >AI >Over 70% mAP for the first time! GeMap: Local high-precision map SOTA refreshed again
Building vectorized high-precision maps based on sensor data in real time is crucial for downstream tasks such as prediction and planning, and can effectively make up for offline high-precision maps The disadvantage of poor real-time performance of the map. With the development of deep learning, online vectorized high-precision map construction has gradually emerged, and representative works such as HDMapNet, MapTR, etc. have emerged one after another. However, existing online vectorized high-precision map construction methods lack exploration of the geometric properties of map elements (including the shape of elements, vertical, parallel and other geometric relationships).
The vectorized high-precision maps highly abstract the elements on the road and represent each map element as a two-dimensional point sequence. The design of urban roads has specific specifications. For example, in most cases, pedestrian crosswalks are square rectangle or parallelogram; in road sections that do not involve diverging and merging, two adjacent Lanes are parallel to each other. Different elements in high-precision maps also have many similar characteristics. These common-sense rules are abstracted into Geometric properties of high-precision maps, including the shape of map elements (rectangle, parallelogram , straight lines, etc.), or associations between different map elements (parallel, vertical, etc.). Geometric properties strongly constrain the representation of map elements. If you fully understand the geometric properties of online model construction, you can get more accurate results.
Although in theory it is still possible for existing models to learn the geometric properties of map elements, however, the geometric properties The characteristics determine that at least under traditional design, the model is not easy to learn.
When the central vehicle drives straight on the road, changes lanes, or turns, (in the vehicle coordinate system ) The absolute coordinates of map elements are constantly changing. The shape of crosswalks, lanes, road boundaries, etc. will not change; similarly, the parallel relationship between lanes will not change. The geometric properties of map elements are objective, and one of its important characteristics is invariance. More specifically, it is rigid invariance (remaining invariant to rotation and translation transformation). Previous work, whether using simple polyline representation or polynomial curves with control points (such as Bezier curves, piecewise Bezier curves), was based on absolute coordinates, and in absolute coordinates Basic end-to-end optimization. The optimization goal based on absolute coordinates itself does not have rigid invariance, so it is difficult to expect that the local optimal solution that the model falls into contains an understanding of geometric properties. Therefore, a representation that can fully characterize the geometric properties and have certain invariance is necessary.
Figure 1. Example of geometric invariance.
When the vehicle turns right, the absolute coordinates will change significantly. The image on the right shows a corresponding real-life scenario.
Furthermore, despite strong prior knowledge, the geometric properties of roads are still diverse. These various geometric properties can generally be divided into two categories, one is about the geometric shape of a single map element, and the other is about the geometric association of different map elements. Due to the diversity of geometric properties, it is impossible to exhaustively and manually convert geometric properties into constraints, so we prefer that the model can autonomously learn a variety of geometric properties end-to-end.
Geometric representation
In view of the above two problems, we first improve the representation method. We hope to introduce a good geometric representation in addition to the traditional representation based on absolute coordinates, which needs to satisfy:
to ensure translation invariance, we used a relative quantity, that is, the offset vector between points; to further ensure rotation invariance, we chose the length of the offset vector , and different Angle between offset vectors. These two - length and angle - form the basis of the geometric representation we propose. In addition, in order to better distinguish and describe shapes and relate two different types of geometric properties, we further refined the design according to the principle of simplicity:
In order to describe the shape, weCalculate the offset vector between adjacent points in a single map element, and calculate the length of the offset vector and the angle between adjacent offset vectors. This representation uniquely identifies any polyline/polygon. Examples of two images are shown below:
Please look at Figure 2, which shows the representation of geometric shapes
For a rectangle, it can be described by using a right angle and two pairs of equal sides; for a straight line, all included angles are 0 Degree or 180.
To characterize the association, similarly, we first consider the distance between any two points. However, if the angle is calculated for all point-to-point offset vectors, the complexity of the representation is too high and the computational cost is unaffordable. Specifically, assuming that there are a total of map elements, and each element is represented by a point, the amount of data for all angles will reach (when taking 1000, assuming that each angle data is a 32-bit floating point number, such a representation is only The space occupied will reach TB level). In fact, this is not necessary for normal vertical, parallel, etc. relationships. Therefore, we first calculate the offsets within the elements, and then only calculate the angle between the two offsets as part of the geometric representation. This simplified association representation retains the ability to describe parallel, vertical and other relationships, while the corresponding data amount is only (roughly 4MB under the aforementioned conditions). For ease of understanding, we also provide some examples:
Figure 3. Geometric association representation.
The parallel relationship and the perpendicular relationship are expressed by the angle between the offset vector being 0 degrees or 90 degrees; the distance between the two points can reflect the width information of the lane to a certain extent
To optimize the representation of geometric shapes and associations, we adopt the simplest approach, directly calculate the geometric representation of predictions and labels, and then use the norm as the optimization target :
Here and represent the length and angle calculated based on the label respectively, and and represent the length and angle calculated based on the prediction. A trick is used when dealing with included angles: directly calculating the angle involves a discontinuous arctan function, which will encounter difficulties during optimization (there is a vanishing gradient problem near ±90 degrees), so what we actually compare is the included angle The cosine and sine values of It also represents the robustness of the loss to rotation and translation transformation
Geometric decoupling attention
An architecture adopted by MapTR, PivotNet, etc. to combine map elements Each point on corresponds to a query of Transformer. The problem with this architecture is that it does not distinguish between the two major categories of geometric properties.In self-attention, all queries (that is, "points") interact equally with each other. However, the shape of the map elements corresponds to a group of queries. The interaction between these groups becomes a liability when perceiving the shape of elements. On the contrary, when perceiving the relationship between elements, shape also becomes a redundant factor
. This means thatdecoupling the perception of shape and association may lead to better results
.To decouple geometry and association processing, we adopt a two-step self-attention process: Each map element consists of
queries, Attention is performed within this
Figure 4. Geometry Decoupling attention.
The left side is the shape attention carried out within a single element, and the right side is the associated attention carried out between elements.
Experimental results
We conducted a large number of experiments on nuScenes and Argoverse 2 data sets. Both are commonly used large-scale autonomous driving data sets, and both provide map annotations.We conducted three sets of experiments on nuScenes. First, we use a relatively pure combination of objective functions, including only geometric losses and other necessary losses (such as point-to-point distance, edge direction, classification). This combination aims to highlight the importance of the geometric properties we propose. value without overly pursuing SOTA results. The results show that our method improves mAP compared to MapTR in this case. To explore the limits of GeMap, we also add some auxiliary objectives, including segmentation and depth estimation. In this case, we also achieved SOTA results (mAP improvement). It is worth noting that achieving such an improvement does not require sacrificing too much inference speed. Finally, we also tried to introduce additional LiDAR modal inputs. With the help of additional modal inputs, the performance of GeMap was further improved
Similarly, in the Argoverse 2 data set On the above, our method also achieved very outstanding results.
The rewritten content is: ablation experiment
The further rewritten content on nuScenes is: ablation experiment proof The value of geometric loss and geometrically decoupled attention. Interestingly, as we expected, using geometric loss directly will lead to a decrease in model performance. We believe that this is because the structural coupling of shape and association processing makes it difficult for the model to optimize the geometric representation; and after combining with the geometric decoupling attention, the geometric loss plays its due role (From "Euclidean Loss" to "Full").
More results
In addition, we also performed a visual analysis of nuScenes. It can be seen from the visualization results that GeMap is not only robust in handling rotation and translation, but also shows certain advantages in solving occlusion problems, as shown in the figure below. Challenging map elements are marked with orange boxes in the figure
Figure 5. Visual comparison results.
In the experimental results on rainy days, we also quantitatively verified the robustness of occlusion (see the table below). This is because rain naturally blocks the camera
This can be explained by the model learning geometric properties and therefore being able to better guess map elements even when there are occlusions. For example, if the model understands the shape of the lane lines, then it only needs to "see" part of it to estimate the rest; the model understands the parallel relationship between the lane lines, or the width characteristics of the lane, so even if one of them is blocked , and can also guess the occluded part based on the parallel relationship and width factors
We pointed out the geometric properties of the map elements and their importance for online vectorization The value of high-precision map construction. Based on this, we propose a powerful method to initially verify this value. In addition, GeMap's robustness to occlusion may indicate the idea of using geometric properties to deal with occlusion in other autonomous driving tasks (such as detection, occupancy prediction, etc.) - because both vehicles and roads have relatively standardized geometric properties. . Of course, our method itself has much to explore further. For example, can geometric elements of different complexity be adaptively described using different points? Is it possible to understand the geometric representation from a probabilistic perspective and make it more robust to noise? Because we have simplified the element association, is there a better representation of geometric association? These are all directions for further optimization.
The content that needs to be rewritten is: https://mp.weixin.qq.com/s/BoxlskT68Kjb07mfwQ7Swg link
The above is the detailed content of Over 70% mAP for the first time! GeMap: Local high-precision map SOTA refreshed again. For more information, please follow other related articles on the PHP Chinese website!