Home >Technology peripherals >AI >The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24)

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24)

WBOY
WBOYforward
2024-04-23 13:20:21690browse

Write in front

Project link: https://nianticlabs.github.io/mickey/

Given two pictures, you can pass Correspondences between images are established to estimate camera poses between them. Typically, these correspondences are 2D to 2D, and our estimated poses are scale-indeterminate. Some applications, such as instant augmented reality anytime, anywhere, require pose estimation of scale metrics, so they rely on external depth estimators to recover scale.

This article proposes MicKey, a key point matching process that can predict metric correspondences in three-dimensional camera space. By learning 3D coordinate matching across images, we are able to infer metric relative pose without depth testing. There is also no need for depth testing, scene reconstruction or image overlap information during training. MicKey is supervised only by image pairs and their relative poses. MicKey achieves state-of-the-art performance on map-free relocalization benchmarks while requiring less supervision than other competing methods.

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

"Metric Keypoints (MicKey) is a feature detection process that solves two problems. First, MicKey regresses keypoint locations in camera space, which allows The matching establishes the metric correspondence. From the metric correspondence, the metric relative pose can be recovered, as shown in Figure 1. Secondly, by using differentiable pose optimization for end-to-end training, MicKey only requires image pairs and their true relative poses. Supervision is required. MicKey learns the correct depth of keypoints implicitly during training, and our training process is robust to image pairs with unknown visual overlap. , therefore, information obtained through SFM (such as image overlap) is not needed. This weak supervision makes MicKey very accessible and attractive because training it on new domains does not require any additional information except pose.

In the map-free relocalization benchmark, MicKey ranks first, outperforming recent state-of-the-art methods. MicKey provides reliable scale-metric pose estimation even under extreme viewing angle changes supported by depth prediction specifically targeted at sparse feature matching. The deformation matching under extreme viewing angle changes supported by this accuracy makes MicKey ideal for supporting the depth estimation necessary for depth estimation matching supported by depth prediction specifically targeting sparse feature matching.

The main contributions are as follows:

MicKey is a neural network that can predict key points from a single image and describe them. Such descriptors can allow estimation of metric relative poses between images.

This training strategy only requires relative pose monitoring, no depth measurement, and no knowledge about image pair overlap.

MicKey introduction

MicKey predicts the three-dimensional coordinates of key points in camera space. The network also predicts keypoint selection probabilities (keypoint distribution) and descriptors that guide the probability of matching (matching distribution). Combining these two distributions, we get the probability that two key points in become corresponding points, and optimize the network to make corresponding points more likely to appear. In a differentiable RANSAC loop, multiple relative pose hypotheses are generated and their losses relative to the true transformation are calculated. Generate gradients through REINFORCE to train corresponding probabilities. Since our pose solver and loss function are differentiable, backpropagation also provides a direct signal for training 3D keypoint coordinates.

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

#1) Metric pose supervised learning

Given two images, calculate their metric relative poses, and key points Score, match probability, and pose confidence (in the form of soft inlier counts). Our goal is to train all relative pose estimation modules in an end-to-end manner. During the training process, we assume that the training data is, where is the real transformation and K/K' is the camera intrinsic parameter. The schematic diagram of the entire system is shown in Figure 2.

In order to learn the coordinates, confidence and descriptors of 3D key points, we need the system to be fully differentiable. However, since some elements in the pipeline are not differentiable, such as keypoint sampling or inlier counting, the relative pose estimation pipeline is redefined as probabilistic. This means that we treat the output of the network as the probability of a potential match, and during training the network optimizes its output to generate probabilities such that the correct match is more likely to be selected.

2) Network structure

MicKey follows a multi-head network architecture with a shared encoder that infers 3D metric keypoints as well as descriptors from the input image, As shown in Figure 3.

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

Encoder. Adopt a pre-trained DINOv2 model as a feature extractor and use its features directly without further training or fine-tuning. DINOv2 divides the input image into blocks of size 14×14 and provides a feature vector for each block. The final feature map F has a resolution of (1024, w, h), where w = W/14 and h = H/14.

Key point Head. Four parallel heads are defined here, which process the feature map F and calculate the xy offset (U), depth (Z), confidence (C) and descriptor (D) maps; where each entry of the map corresponds to the input A 14×14 block in the image. MicKey has the rare property of predicting keypoints as relative offsets from a sparse regular grid. The absolute 2D coordinates are obtained as follows:

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

Experimental comparison

Relative pose evaluation on the map-free dataset. Area under the curve (AUC) and precision (Prec.) values ​​for the VCRE metric at a 90-pixel threshold are reported, with both versions of MicKey achieving the highest results. Additionally, the median error is also reported, and while MicKey obtains the lowest value in terms of VCRE error, other methods, such as RoMa, provide lower pose errors. To calculate the median error, the baseline only uses valid poses generated by each method, therefore, we report the estimated total number of poses. Finally, matching times are reported and MicKey is found to be comparable to LoFTR and LighGlue while significantly reducing the times of RoMa, the closest competitor to MicKey in terms of VCRE metrics. The matching method uses DPT to recover the scale.

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

#Example of corresponding point, score and depth maps generated by MicKey. MicKey finds effective correspondence points even in the presence of large-scale changes or wide baselines. Note that due to our feature encoder, the resolution of the depth map is 14 times smaller than the input image. We follow the depth map visualization method used in DPT, where lighter colors represent closer distances.

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

Relative pose evaluation on the ScanNet dataset. All feature matching methods are used in conjunction with PlaneRCNN to recover metric scales. We indicate the training signals for each method: depth (D), overlap score (O), and pose (P).

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\24)

The above is the detailed content of The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete