Home > Article > Technology peripherals > GSLAM | A general SLAM architecture and benchmark
Suddenly discovered a 19-year paper
GSLAM: A General SLAM Framework and Benchmark
Open source code: https://github.com/zdzhaoyong/GSLAM
Go directly to the full text to feel the quality of this work~
SLAM technology has achieved many successes recently and attracted attracted the attention of high-tech companies. However, how to effectively perform benchmarks on speed, robustness, and portability with interfaces to existing or emerging algorithms remains a problem. In this paper, a new SLAM platform called GSLAM is proposed, which not only provides evaluation functions but also provides researchers with useful tools to quickly develop their own SLAM systems. The core contribution of GSLAM is a universal one. Cross-platform, fully open source SLAM interface designed to handle the interaction of input datasets, SLAM implementations, visualizations and applications in a unified framework. Through this platform, users can implement their own functions in the form of plug-ins to improve the performance of SLAM and further push the application of SLAM into practical applications.
Simultaneous localization and mapping (SLAM) has been a hot topic in the field of computer vision and robotics since the 1980s. Research Topics. SLAM provides essential functionality for many applications that require real-time navigation, such as robotics, unmanned aerial vehicles (UAVs), autonomous driving, and virtual and augmented reality. In recent years, SLAM technology has developed rapidly, and various SLAM systems have been proposed, including monocular SLAM systems (feature point-based, direct and semi-direct methods), multi-sensor SLAM systems (RGBD, binocular and inertial-assisted methods) and Learning-based SLAM systems (supervised and unsupervised methods).
However, with the rapid development of SLAM technology, almost all researchers focus on the theory and implementation of their own SLAM systems, which makes it difficult to exchange ideas and not easy to implement migration to other systems. This hinders the rapid application of SLAM technology in various industry fields. Furthermore, there are currently many different implementations of SLAM systems, and how to effectively benchmark speed, robustness, and portability remains an issue. Recently, Nardi et al. and Bodin et al. proposed a unified SLAM benchmark system to conduct quantitative, comparable, and verifiable experimental studies and also explore the trade-offs between various SLAM systems. These systems make it easy to conduct evaluation experiments using datasets and metric evaluation modules.
Since existing systems only provide evaluation benchmarks, this paper believes that it is possible to establish a platform to serve the entire life cycle of SLAM algorithms, including development, evaluation, and application stages. In addition, deep learning-based SLAM has made significant progress in recent years, so it is necessary to create a platform that supports not only C but also Python to better support the integration of geometry and deep learning-based SLAM systems. Therefore, in this paper, a novel SLAM platform is introduced that not only provides evaluation capabilities but also provides researchers with useful tools to quickly develop their own SLAM systems. Through this platform, commonly used functions are provided in the form of plug-ins, so users can use them directly or create their own functions for better performance. It is hoped that this platform can further promote the practical application of SLAM systems. In summary, the main contributions of this paper are as follows:
The following first introduces the interface of the GSLAM framework and explains the working principle of GSLAM. Secondly, three practical components are introduced, namely Estimator, Optimizer and Vocabulary. Then, several typical public datasets are used to evaluate different popular SLAM implementations using the GSLAM framework. Finally, we summarize these works and look forward to future research directions.
SLAM technology is used to build maps in unknown environments , and locate the sensors in the map, focusing mainly on real-time operations. Early SLAM was mainly based on extended Kalman filtering (EKF). The motion parameters of 6 degrees of freedom and 3D landmarks are represented probabilistically as a single state vector. The complexity of classic EKF increases quadratically with the increase in the number of landmarks, limiting its scalability. In recent years, SLAM technology has developed rapidly, and many monocular visual SLAM systems have been proposed, including feature point-based, direct methods and semi-direct methods. However, monocular SLAM systems lack scale information and cannot handle pure rotation situations, so some other multi-sensor SLAM systems, including RGBD, binocular and inertial-assisted methods emerged to improve robustness and accuracy.
Although a large number of SLAM systems have been proposed, there has been little work on unifying the interfaces of these algorithms and no comprehensive comparison of their performance. Furthermore, implementations of these SLAM algorithms are often released as standalone executables rather than libraries, and often do not conform to any standard structure.
Recently, supervised and unsupervised visual odometry (VO) based on deep learning have proposed novel ideas compared with traditional geometry-based methods. However, further optimizing the consistency of multiple keyframes is still not easy. GSLAM provides tools that can help achieve better global consistency. Through this framework, it is easier to visualize or evaluate the results and further apply them to various industry sectors.
In the field of robotics and computers, Robot System (ROS) provides a very convenient communication method between nodes and is favored by most robots. Researchers favor. Many SLAM implementations provide ROS wrappers to subscribe to sensor data and publish visualization results. However, it does not unify the input and output of SLAM implementation, making it difficult to further evaluate different SLAM systems.
Inspired by the ROS message architecture, GSLAM implements a similar inter-process communication utility class called Messenger. This provides an alternative to ROS within the SLAM implementation and maintains compatibility, that is, all ROS-defined messages are supported within the framework and ROS wrappers are implemented naturally. Thanks to the in-process design, messages are delivered without serialization and data transfer, and messages can be sent without delay and additional cost. At the same time, the payload of a message is not limited to ROS-defined messages, but can also be any copyable data structure. Furthermore, not only providing evaluation capabilities, but also providing researchers with useful tools to quickly develop and integrate their own SLAM algorithms.
Currently there are several SLAM benchmark systems, including the KITTI benchmark, TUM RGB-D benchmark and ICL-NUIM RGB-D benchmark data set , these systems only provide evaluation functions. In addition, SLAMBench2 extends these benchmarks into algorithms and datasets, requiring users to make published implementations compatible with SLAMBench2 for evaluation, which is difficult to extend to more application areas. Unlike these systems, the GSLAM platform proposed in this paper provides a solution that can serve the entire life cycle of SLAM implementation, from development to evaluation to application. Provides researchers with useful tools to quickly develop their own SLAM systems and further develop visualizations, evaluations and applications based on a unified interface.
The framework of GSLAM is shown in the figure. Overall, the interface is designed to handle the interaction of three parts.
The framework is designed to be compatible with a variety of different types of SLAM implementations, including but not limited to monocular, binocular, RGBD, and multi-camera visual inertial odometry with multi-sensor fusion. Modern deep learning platforms and developers prefer to code in Python, so GSLAM provides Python bindings, enabling developers to implement SLAM in Python and call it using GSLAM, or use Python to call C-based SLAM implementations. Additionally, JavaScript is supported for web-based uses.
Some data structures commonly used by SLAM interfaces include parameter setting/reading, image format, attitude transformation, camera model and map data structure. The following is a brief introduction to some basic interface classes.
Paramter Setting
GSLAM uses a small parameter parsing and parameter setting class Svar, which contains only one header file, relies on C 11, and has the following characteristics:
a. Parameter parsing, configuration loading and help information. Similar to popular parameter parsing tools such as Google gflags, variable configurations can be loaded from command line arguments, files, and the system environment. Users can also define different types of parameters and provide introductory information, which will be displayed in the help document.
b. A small script language that supports variables, functions and conditional statements to make configuration files more powerful.
c. Thread-safe variable binding and sharing. It is recommended to bind frequently used variables to pointers or references, which not only provides efficiency but also convenience.
d, Simple function definition and calling from C or pure script. Bindings between commands and functions help developers decouple file dependencies.
e. Supports tree structure representation, which means configurations can be easily loaded or saved using XML, JSON and YAML formats.
Intra-Process Messaging
Because ROS provides a very convenient communication method between nodes, it is favored by most robotics researchers. Inspired by the ROS2 message architecture, GSLAM implements a similar inter-process communication utility class called Messenger. This provides an alternative to ROS within the SLAM implementation while maintaining compatibility. Due to its inter-process design, Messenger is able to publish and subscribe to any class at no additional cost. The following is an introduction to more functions:
a. The interface adopts the ROS style, which is easy for users to use. And it supports all ROS-defined messages, which means that it requires very little work to replace the original ROS messaging system.
b. Since there is no serialization and data transfer, messages can be sent without delay and additional cost. At the same time, the payload of a message is not limited to ROS-defined messages, but also supports any copyable data structure.
c. The source code only includes C 11-based header files with no additional dependencies, making it portable.
d.API is thread-safe and supports multi-threaded conditional notifications when the queue size is greater than zero. Before the publisher and subscriber connect to each other, the topic name and RTTI data structure are checked to ensure that they are called correctly.
3D Transforamtion
##For the rotated part, there are several representation options to choose from , including matrices, Euler angles, unit quaternions and Lie algebra so(3). For a given transformation, any one of them can be used to represent it, and can be converted into each other. However, when considering multiple transformations and manifold optimization, close attention needs to be paid to the chosen representation. The matrix representation is over-parameterized using 9 parameters, while the rotation has only 3 degrees of freedom (DOF). Euler angle representation uses three variables and is easy to understand, but it faces the problem of universal lock and is inconvenient for multiple transformations. Unit quaternions are the most efficient way to perform multiple rotations, while Lie algebras are a common representation for performing popular optimizations. Similarly, the Lie algebras se(3) and sim(3) of rigid bodies and similarity transformations are defined. GSLAM uses quaternions to represent the rotation part and provides functions to convert one representation to another. Table 1 shows the transformation implementation and compares it with three other manifold implementations (Sophus, TooN and Ceres). Since the Ceres implementation uses angular axis representation, the exponential and logarithm of the rotation are not required. As shown in the table, GSLAM's implementation performs better because it uses quaternions and has better optimization, while TooN uses a matrix implementation and performs better in terms of point transformations.Image format
The storage and transmission of image data is one of the most important functions in visual SLAM. To improve efficiency and convenience, GSLAM uses a data structure GImage, which is compatible with cv::Mat. It has a smart pointer counter to ensure memory is released safely and can be transferred easily without memory copying. Data pointers are aligned for easier Single Instruction Multiple Data (SIMD) acceleration. Users can convert between GImage and cv::Mat seamlessly and safely without memory copying.
Camera Models
Since SLAM may contain radial and tangential distortion caused by manufacturing imperfections, or by Images captured by fisheye or panoramic cameras, therefore different camera models have been proposed to describe the projection. GSLAM provides implementations including OpenCV (used by ORB-SLAM), ATAN (used by PTAM) and OCamCalib (used by MultiCol-SLAM). Users can also easily inherit these classes and implement other camera models such as Kannala-Brandt and isometric panoramic models.
Map Data Structure
For SLAM implementation, the goal is to locate and generate maps in real time. GSLAM recommends using a unified map data structure consisting of multiple map frames and map points. This data structure is suitable for most existing visual SLAM systems, including feature-based or direct methods.
Map frames are used to represent location status at different times, including various information or estimation results captured by sensors, including IMU or GPS raw data, depth information, and camera models. The SLAM implementation estimates the relationships between them, and the connections between them form a pose graph.
Map points are used to represent the environment observed by frames, typically used by feature-based methods. However, a map point can represent not only a keypoint, but also a GCP (Ground Control Point), edge line, or 3D object. Their correspondence to map frames forms an observation graph, often called a bundle graph.
In order to make it easier to implement a SLAM system, GSLAM provides a utility class. This section will briefly introduce three optimized modules, namely Estimator, Optimizer and Vocabulary.
Pure geometric calculations remain a fundamental problem that requires powerful and accurate real-time solutions. Traditional visual SLAM algorithms or modern visual-inertial solutions rely on geometric vision algorithms for initialization, relocation, and loop closure. OpenCV provides multiple geometric algorithms, and Kneip provides a toolbox for geometric vision, OpenGV, which is limited to camera pose calculations. GSLAM's Estimator aims to provide a family of closed-form solvers covering all cases and uses the Robust Random Sampling Consistency Method (RANSAC).
Table 2 lists the algorithms supported by Estimator. Based on the given observational data, they are divided into three categories. 2D-2D matching is used to estimate epipolar or homography constraints, and relative poses can be decomposed from them. 2D-3D corresponds to estimating the central or non-central absolute pose of a monocular or multi-camera system, which is the famous PnP problem. 3D geometry functions such as plane fitting, and estimating SIM transformations of two point clouds are also supported. Most algorithms rely on the open source linear algebra library Eigen, which is a header-only library and available on most platforms.
Nonlinear optimization is the core part of modern geometric SLAM systems. Due to the high latitude and sparsity of the Hessian matrix, graph structures are used to model the complex estimation problem of SLAM. Several frameworks, including Ceres, G2O, and GTSAM, are proposed to solve general graph optimization problems. These frameworks are widely used in different SLAM systems. ORB-SLAM and SVO use G2O for BA and pose graph optimization. OKVIS and VINS use Ceres for graph optimization with IMU factors, and sliding windows are used to control computational complexity. Forster et al. proposed a visual initialization method based on SVO and used GTSAM to implement the backend.
GSLAM’s Optimizer aims to provide a unified interface for most nonlinear SLAM problems, such as PnP solver, BA, pose graph optimization. A universal plug-in for these problems is implemented based on the Ceres library. For specific problems, such as BA, some more efficient implementations, such as PBA and ICE-BA, are also available as plug-ins. Using the optimizer tool, developers can access different implementations using a unified interface, especially for deep learning-based SLAM systems.
Place recognition is one of the most important parts of the SLAM system and is used for relocation and loopback detection. The Bag of Words (BoW) method is widely used in SLAM systems because of its efficiency and excellent performance. FabMap proposes a probabilistic method for appearance-based place recognition, which is used in systems such as RSLAM and LSD-SLAM. Since it uses floating-point descriptors like SIFT and SURF, DBoW2 builds a vocabulary tree for training and detection, supporting binary and floating-point descriptors. Refael proposed two improved versions of DBoW2, DBoW3 and FBoW, which simplify the interface and speed up training and loading. Afterwards, ORB-SLAM adopted the ORB descriptor and used DBoW2 for loop detection, relocation and fast matching. Subsequently, a series of SLAM systems, such as ORB-SLAM2, VINS-Mono and LDSO, used DBoW3 for loopback detection. It has become the most popular tool for implementing location recognition in SLAM systems.
Inspired by the above work, GSLAM implemented the DBoW3 vocabulary only with header files, which has the following characteristics:
Table 3 shows the comparison of four word bag libraries. In the experiment, each parent node has 10 child nodes, ORB feature detection uses ORB-SLAM, and SIFT detection uses SiftGPU. The ORB vocabulary is used in the implementation results, with levels 4 and 6 respectively, and a SIFT vocabulary. Both FBoW and GSLAM use multi-threading for vocabulary training. GSLAM's implementation outperforms other implementations in almost all projects, including loading and saving vocabularies, training new vocabularies, converting descriptor lists into BoW vectors for place recognition and feature vectors for fast feature matching. Additionally the GSLAM implementation uses less memory and allocates fewer dynamic memory blocks since the main reason DBoW2 requires a lot of memory is fragmentation issues.
Existing benchmarks require users to download the test data set and upload the results To perform an accuracy evaluation, this is not sufficient to unify the operating environment and evaluate a fair performance comparison. Thanks to GSLAM's unified interface, the evaluation of SLAM systems becomes more elegant. With the help of GSLAM, developers can simply upload a SLAM plug-in and perform various evaluations of speed, computational cost, and accuracy in a dockerized environment using fixed resources. In this section, some datasets and implemented SLAM plugins will first be introduced. Then, three representative SLAM implementations are evaluated on speed, accuracy, memory, and CPU usage. This evaluation aims to demonstrate the possibilities of unified SLAM benchmark implementation with different SLAM plugins.
Running a SLAM system typically requires sensor data streams and corresponding configuration. In order to allow developers to focus on the development of core SLAM plug-ins, GSLAM provides a standard data set interface, and developers do not need to care about SLAM input. Online sensor input and offline data are provided through different data set plug-ins. The correct plug-in will be dynamically loaded according to the given data set path suffix. The dataset implementation should provide all requested sensor streams and associated configuration, so no additional setup is required for different datasets. All different sensor streams are published through Messenger, using standard topic names and data formats.
GSLAM has implemented several popular visual SLAM dataset plug-ins, as shown in Table 4. Users can also very easily implement a dataset plugin based on the header-only GSLAM core, publish it as a plugin and compile it with the application.
Figure 2 shows some screenshots of the open source SLAM and SfM plugins running using the built-in Qt visualizer. The framework supports SLAM systems of different architectures, including direct methods, semi-direct methods, feature-based methods, and even SfM methods. DSO implementations need to publish results such as point clouds, camera poses, trajectories, and pose maps for visualization like ROS-based implementations. Users can use a unified framework to access different SLAM plug-ins, and it is very convenient to develop SLAM-based applications based on C, Python and Node-JS interfaces. Since many researchers use ROS in development, GSLAM also provides a ROS visualization plug-in to seamlessly transmit ROS-defined messages and enable developers to leverage Rviz for display or continue developing other ROS-based applications.
Since most existing benchmarks only provide datasets or do not have groundtruth for users to conduct their own evaluations, GSLAM provides a built-in plug-in and some script tools for Computational performance and accuracy evaluation.
The paper uses the sequence nostructure-texture-near-withloop in the TUM RGBD data set to demonstrate the execution of the evaluation. The following experiments use three open source monocular SLAM plug-ins DSO, SVO and ORB-SLAM. In all experiments, a computer with i7-6700 CPU, GTX 1060 GPU and 16GB RAM running 64-bit Ubuntu 16.04 was used.
Computational performance evaluation includes memory usage, number of allocated memory blocks, CPU usage and statistics of each frame, as shown in Figure 3. The results show that SVO uses the least memory and CPU resources and achieves the fastest speed. And since SVO is just a visual odometer and only maintains a local map inside the implementation, the cost remains stable. DSO allocates fewer memory blocks, but consumes more than 100MB of memory and grows slowly. One problem with DSO is that the processing time increases dramatically when the number of frames drops below 500, in addition, keyframes take even longer to process. ORB-SLAM uses the most CPU resources, the calculation time is stable, but the memory usage increases rapidly, and it allocates and releases a large number of memory blocks because its BA uses the G2O library and does not use the incremental optimization method.
Figure 4 shows the evaluation results of the odometry trajectory. As shown in the figure, SVO is faster but has larger drift, while ORBSLAM achieves the highest accuracy in terms of absolute attitude error (APE). Since the comprehensive evaluation is a pluggable plug-in application, more evaluation metrics such as point cloud accuracy can be re-implemented.
This article introduces a new general-purpose SLAM platform called GSLAM, which Support from development, evaluation to application is presented. Through this platform, commonly used toolkits are provided in the form of plug-ins, and users can also easily develop their own modules. To make the platform easy to use, make the interface only dependent on C++11. In addition, Python and JavaScript interfaces are provided to better integrate traditional SLAM and deep learning-based SLAM, or perform distributed operations on the Web.
In the following research, more SLAM implementations, documents and demonstration codes will be provided for easy learning and use. In addition, the integration of traditional SLAM and deep learning-based SLAM will be provided to further explore the unknown possibilities of SLAM systems.
The homepage of this work is as follows:
GSLAM: Main Page
It feels like a framework for learning the principles of each part of SLAM~
Original link: https://mp.weixin.qq.com/s/PCxhqhK3t1soN5FI0w9NFw
The above is the detailed content of GSLAM | A general SLAM architecture and benchmark. For more information, please follow other related articles on the PHP Chinese website!