


ICLR'24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness
Written in front&The author’s personal understanding
Maps are key information for downstream applications of autonomous driving systems, and are usually represented by lanes or center lines. However, the existing map learning literature mainly focuses on detecting geometry-based topological relationships of lanes or sensing centerlines. Both methods ignore the inherent relationship between lane lines and center lines, that is, lane lines bind center lines. Although simply predicting two types of lanes in one model are mutually exclusive in the learning goal, this paper proposes lane segmentation as a new representation that seamlessly combines geometric and topological information, thus proposing LaneSegNet. This is the first end-to-end mapping network that generates lane segments to obtain a complete representation of road structure. LaneSegNet has two key modifications. One is the lane attention module, which is used to capture key area details within long-distance feature space. The other is the same initialization strategy of the reference point, which enhances the learning of position priors for lane attention. On the OpenLane-V2 dataset, LaneSegNet has significant advantages over previous similar products in three tasks, namely map element detection (4.8 mAP), lane centerline perception (6.9 DETl) and newly defined lane segment perception (5.6 mAP). Additionally, it achieved a real-time inference speed of 14.7FPS.
Open source link: https://github.com/OpenDriveLab/LaneSegNet
In summary, the main contributions of this article are as follows:
- This article introduces a new lane segment perception as a new map learning formula. It contains geometric and topological elements. We hope it will bring new insights to the field.
- This article proposes LaneSegNet, an end-to-end network proposed for lane segment awareness. Two new modifications have been proposed, including a lane attention module with heads-to-regions mechanism to capture long-range attention, and the same initialization strategy for reference points to enhance the location prior of lane attention. study.
Review of Related Work
Centerline Perception: Centerline Perception from Vehicle-mounted Sensor Data (Compared to Lane Map Learning in this paper same) has attracted a great deal of attention recently. STSU proposed a DETR-like network to detect centerlines, followed by a multilayer perceptron (MLP) module to determine their connectivity. Based on STSU, Can et al. introduced an additional minimum loop query to ensure the correct order of overlapping rows. CenterLineDet treats center lines as vertices and designs a graph update model trained through imitation learning. It is worth noting that Tesla proposed the concept of "lane language" to express the lane map as a sentence. Their attention-based model recursively predicts lane markings and their connectivity. In addition to these segmentation methods, LaneGAP also introduces a path method that uses an additional transformation algorithm to recover the lane map. TopoNet targets complete and diverse driving scene graphs, explicitly models the connectivity of center lines within the network, and incorporates traffic elements into the task. In this work, we adopt the segment method to construct lane graphs. However, we differ from previous methods in modeling Lane Segments instead of taking the centerline as the vertex of the lane graph, which allows convenient integration of segment-level geometric and semantic information.
Map element detection: In previous work, people focused on improving map element detection from the camera plane to 3D space to overcome projection errors. With the popular trend of BEV sensing, recent works focus on learning HD maps using segmentation and vectorization methods. Map segmentation predicts the semantics of each pure BEV grid, such as lanes, crosswalks, and drivable areas. These works mainly differ in perspective view (PV) to BEV conversion modules. However, segmented maps cannot provide direct information used by downstream modules. HDMapNet handles this problem by grouping and vectorizing segmentation maps with complex post-processing.
Although dense segmentation provides pixel-level information, it still cannot touch the complex relationships of overlapping elements. VectorMapNet proposes to represent each map element directly as a sequence of points, using coarse keypoints to sequentially decode lane locations. MapTR explores a unified permutation-based point sequence modeling approach to eliminate modeling ambiguity and improve performance and efficiency. PivotNet further models map elements using pivot-based representation in an ensemble prediction framework to reduce redundancy and improve accuracy. StreamMapNet utilizes multi-point attention and temporal information to improve the stability of remote map element detection. In fact, since vectorization also enriches the direction information of lanes, vectorization-based methods can be easily adapted to centerline awareness through alternating supervision. In this work, we propose a unified, easy-to-learn representation—lane segmentation—for all HD map elements on a road.
Detailed explanation of LaneSegNet
Lane Segment Perception Task Description
Instances of Lane Segment contain geometric and semantic aspects of the road. As for geometry, it can be represented as a line segment consisting of a vectorized centerline and its corresponding lane boundary: . Each line is defined as an ordered collection of points in 3D space. Alternatively, the geometry can be described as a closed polygon that defines the drivable area within that lane.
In terms of semantics, it includes Lane Segment category C (e.g., Lane Segment, pedestrian crossing) and the line style of the left/right lane boundary (e.g., invisible, solid, dashed line): {}. These details provide autonomous vehicles with important insights into deceleration requirements and the feasibility of lane changes.
In addition, topology information plays a crucial role in path planning. To represent this information, a lane graph is constructed for Lane Segment, represented as G = (V, E). Each Lane Segment is a node in the graph, represented by the set V, and the edges in the set E describe the connectivity between Lane Segments. We use an adjacency matrix to store this lane graph, where matrix element (i, j) is set to 1 only when the j-th Lane Segment follows the i-th Lane Segment; otherwise, it remains 0.
LaneSegNet Framework
The overall framework of LaneSegNet is shown in Figure 2. LaneSegNet takes surround images as input to perceive Lane Segments within a specific BEV range. In this section, we first briefly introduce the LaneSeg encoder used to generate BEV features. Then, we introduce lane segmentation decoder and lane attention. Finally, we propose lane segmentation predictors along with training losses.
LaneSeg Encoder
The encoder converts the surround image into BEV features for Lane Segment extraction. We utilize the standard ResNet-50 backbone to derive feature maps from raw images. The PV to BEV encoder module using BEVFormer is then used for view conversion.
LaneSeg Decoder
Transformer-based detection method utilizes the decoder to collect features from BEV features and updates the decoder query through multiple layers. Each decoder layer utilizes self-attention, cross-attention mechanisms, and feed-forward networks to update the query. Additionally, learnable location queries are employed. The updated query is then output and fed to the next stage.
Due to complex and elongated map geometries, collecting long-range BEV features is crucial for online mapping tasks. Previous work utilizes hierarchical (instance point) decoder queries and deformable attention to extract local features for each point query. Although this approach avoids capturing long-distance information, it comes with high computational cost due to the increased number of queries.
Lane Segment, as a lane instance representation for constructing scene graphs, has superior characteristics at the instance level. Our goal is not to use multi-point queries, but to use single instance queries to represent Lane Segments. Therefore, the core challenge is how to use single instance queries to cross-focus on global BEV features.
Lane Attention: In target detection, deformable attention uses the position prior of the target and only focuses on a small part of the attention values near the target reference point as a pre-filter, which greatly improves the Convergence is accelerated. During layer iterations, a reference point is placed at the center of the prediction target to refine the sampling locations of attention values, which are dispersed around the reference point via learnable sampling offsets. Intentional initialization of sample offsets includes the geometry preceding the 2D target. By doing so, the multi-branch mechanism can capture the characteristics of each direction well, as shown in Figure 3a.
In the context of map learning, Li et al. used naive deformable attention to predict center lines. However, as shown in Figure 3b, due to the naive placement of the reference points, it may not be able to obtain lone range attention. Furthermore, due to the elongated shape of the target and complex visual cues (e.g., accurately predicting breakpoints between solid and dashed lines), this process requires additional adaptive design for our task. Considering all these characteristics, it is necessary for the network to have the ability to not only pay attention to long-range contextual information, but also accurately extract local details. Therefore, it is recommended to distribute the sampling locations over a large area to effectively perceive long-distance information. On the other hand, local details should be easily distinguishable to identify key points. It is worth noting that although there is a competitive relationship between value features within a single attention head, value features between different heads can be retained during the Attention process. Therefore, it is promising to explicitly exploit this property to promote attention to local features of a specific region.
To this end, this article proposes to establish a heads-to-regions mechanism. We first distribute multiple reference points evenly within the Lane Segment area. The sampling locations are then initialized around each reference point in the local area. To preserve complex local details, we use a multi-branch mechanism, where each head focuses on a specific set of sampling locations within a local area, as shown in Figure 3c.
A mathematical description of the lane attention module is now provided. Given BEV features, i-th Lane Segment query feature qi and a set of reference points pi as input, lane attention is calculated as follows:
The same initialization of the reference points: Ref. The location of the points is the determining factor for the functionality of the lane attention module. In order to align the area of interest of each instance query with its actual geometry and location, the reference point p in each instance query is distributed based on the Lane Segment prediction of the previous layer, as shown in Figure 3c. and iteratively refine the predictions.
Previous work argued that the reference points provided to the first layer should be individually initialized with learnable priors derived from position query embeddings. However, since the location query is independent of the input image, this initialization method may in turn limit the model's ability to remember geometric and location priors, and incorrectly generated initialization locations can also pose an obstacle to training.
Therefore, for the first layer of the Lane Segment decoder, we propose the same initialization strategy. In the first layer, each head takes the same reference point generated by the position query. Compared with the distributed initialization of reference points in traditional methods (i.e., initializing multiple reference points for each query), the same initialization will make the learning of position priors more stable by filtering out the interference of complex geometries. Note that the same initialization may seem counter-intuitive, but has been observed to work.
LaneSeg Predictor
We use MLP in multiple prediction branches to generate the final predicted Lane Segment from the Lane Segment query, taking into account geometric, semantic and topological aspects. .
For geometry, we first designed a centerline regression branch to regress the vectorized point position of the centerline in three-dimensional coordinates. The output format is. Due to the symmetry of the left and right lane boundaries, we introduce an offset branch to predict the offset, whose format is. Therefore, the left and right lane boundary coordinates can be calculated using
Assuming that lane segments can be conceptualized as drivable areas, we integrate the instance segmentation branch into the predictor. In terms of semantics, three classification branches predict the classification score of C, and the score of C in parallel. The topological branch takes the updated query features as input and outputs a weighted adjacency matrix of the lane graph G using MLP.
Training Loss
LaneSegNet adopts a DETR-like paradigm, using the Hungarian algorithm to efficiently compute a one-to-one optimal allocation between predictions and ground truth. The training loss is then calculated based on the distribution results. The loss function consists of four parts: geometric loss, classification loss, laneline classification loss and topological loss.
Geometric loss supervises the geometric structure of each predicted Lane Segment. According to the binary matching result, a GT Lane Segment is assigned to each predicted vectorized Lane Segment. The vectorized geometric loss is defined as the Manhattan distance calculated between assigned Lane Segment pairs.
Experimental results
Main experimental structure
Lane Segment perception: In Table 1, We compare LaneSegNet with several state-of-the-art methods MapTR, MapTRv2 and TopoNet on the newly introduced Lane Segment-aware benchmark. Retrain their model with our Lane Segment labels. LaneSegNet outperforms other methods by up to 9.6% in mAP, and the average distance error is relatively reduced by 12.5%. LaneSegNet-mini also outperforms previous methods with a higher FPS of 16.2.
Qualitative results are shown in Figure 4:
Map element detection: In order to Map element detection methods For a fairer comparison, we decompose LaneSegNet’s predicted Lane Segment into pairs of lanes and then compare it with state-of-the-art methods using map element detection metrics. We feed the disassembled lane line and crosswalk labels into several state-of-the-art methods for retraining. The experimental results are shown in Table 2, showing that LaneSegNet always outperforms other methods in map element detection tasks. On a fair comparison, LaneSegNet recovers road geometry better with additional supervision. This shows that Lane Segment learning representation is good at capturing road geometric information.
Centerline Awareness: We also compare LaneSegNet with state-of-the-art centerline awareness methods in Table 3. For consistency, center lines are also extracted from Lane Segment for retraining. It can be concluded that LaneSegNet's performance in the lane map perception task is significantly higher than other methods. With additional geographical monitoring, LaneSegNet also demonstrates superior topological reasoning capabilities. It is proved that reasoning ability is closely related to strong positioning and detection capabilities.
Ablation Experiment
Lane Segment Formula: In Table 4, we provide ablation to verify our proposed Lane Segment learning formula design advantages and training efficiency. Compared to the separately trained models in the first two rows, joint training of centerlines and map elements brings an overall average improvement of 1.3 on the two main metrics, as shown in row 4, demonstrating the feasibility of multi-task training. However, the common approach of training centerlines and map elements in a single branch by adding additional categories leads to significant performance degradation. Compared with the above naive single-branch method, our model trained with Lane Segment labels obtains significant performance enhancement (7.2 on OLS and 4.4 on mAP for comparison between rows 3 and 5), which verifies The positive interaction between various road information in our map learning formulation is demonstrated. Our model even outperforms multi-branch methods, especially in centerline awareness (OLS of 4.8). This shows that geometry can guide topological reasoning in our map learning formulation, where the multi-branch model only slightly outperforms the CL-only model (0.6 OLS between rows 1 and 4). As for the small decrease, it comes from the reshaping process of our prediction results and is caused by the error of line classification.
Lane Attention Module: We show the attention The force module ablation is shown in Table 5. To facilitate a fair comparison, we replace the lane attention module in the framework with an alternative attention design. With our careful design, LaneSegNet with lane attention significantly outperforms these methods, showing significant improvements (mAP improved by 3.9 and TOPll improved by 1.2 compared to row 1). Furthermore, the decoder latency can be further reduced (from 23.45ms to 20.96ms) due to the reduction in the number of queries compared to the hierarchical query design.
Conclusion
This paper proposes Lane Segment awareness as a new map learning formula, and proposes LaneSegNet, an end-to-end solution specifically for this problem network. In addition to the network, two innovative enhancements are proposed, including a lane attention module that employs a head-to-region mechanism to capture long-distance attention, and the same initialization strategy of reference points to enhance the location of lane attention. Prior learning. Experimental results on the OpenLane-V2 dataset demonstrate the effectiveness of our design.
Limitations and Future Work. Due to computational limitations, we do not extend the proposed LaneSegNet to more additional backbones. The formulation of Lane Segment awareness and LaneSegNet may benefit downstream tasks and is worth exploring in the future.
The above is the detailed content of ICLR'24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness. For more information, please follow other related articles on the PHP Chinese website!
![Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]](https://img.php.cn/upload/article/001/242/473/174717025174979.jpg?x-oss-process=image/resize,p_40)
ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Notepad++7.3.1
Easy-to-use and free code editor
