Multi-output Learning for Camera Relocalization

Guzman-Rivera, Abner; Kohli, Pushmeet; Glocker, Ben; Shotton, Jamie; Sharp, Toby; Fitzgibbon, Andrew; Izadi, Shahram

doi:10.1109/cvpr.2014.146

Cited by 97 publications

(87 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, offline approaches based on using regression to predict 2D-to-3D correspondences [25], [26], [27], [28], [30], [32] have been shown to achieve state-of-the-art camera relocalisation results, but their adoption for online relocalisation in practical systems such as InfiniTAM [3], [13] has been hindered by the need to train extensively on the target scene ahead of time. In [37], we showed that it was possible to circumvent this limitation by adapting offline-trained regression forests to novel scenes online.…”

Section: Resultsmentioning

confidence: 99%

Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade

Cavallari¹,

Golodetz²,

Lord³

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Camera pose estimation is an important problem in computer vision, with applications as diverse as simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques match the current image against keyframes with known poses coming from a tracker, directly regress the pose, or establish correspondences between keypoints in the current image and points in the scene in order to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but have traditionally needed to be trained offline on the target scene, preventing relocalisation in new environments. Recently, we showed how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. The adapted forests achieved relocalisation performance that was on par with that of offline forests, and our approach was able to estimate the camera pose in close to real time, which made it desirable for systems that require online relocalisation. In this paper, we present an extension of this work that achieves significantly better relocalisation performance whilst running fully in real time. To achieve this, we make several changes to the original approach: (i) instead of simply accepting the camera pose hypothesis produced by RANSAC without question, we make it possible to score the final few hypotheses it considers using a geometric approach and select the most promising one; (ii) we chain several instantiations of our relocaliser (with different parameter settings) together in a cascade, allowing us to try faster but less accurate relocalisation first, only falling back to slower, more accurate relocalisation as necessary; and (iii) we tune the parameters of our cascade, and the individual relocalisers it contains, to achieve effective overall performance. Taken together, these changes allow us to significantly improve upon the performance our original state-of-the-art method was able to achieve on the well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional contributions, we present a novel way of visualising the internal behaviour of our forests, and use the insights gleaned from this to show how to entirely circumvent the need to pre-train a forest on a generic scene.

show abstract

Section: Resultsmentioning

confidence: 99%

Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade

Cavallari¹,

Golodetz²,

Lord³

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…While approaches exist that require an offline training phase (e.g. [27], [28]), below we focus on methods which are capable of online real-time performance. One can roughly categorize existing approaches into two categories, though hybrid [29] and more exotic variants exist [30], [31].…”

Section: Related Workmentioning

confidence: 99%

Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding

Glocker

Shotton

Criminisi

et al. 2015

IEEE Trans. Visual. Comput. Graphics

Self Cite

106

View full text Add to dashboard Cite

Recovery from tracking failure is essential in any simultaneous localization and tracking system. In this context, we explore an efficient keyframe-based relocalization method based on frame encoding using randomized ferns. The method enables automatic discovery of keyframes through online harvesting in tracking mode, and fast retrieval of pose candidates in the case when tracking is lost. Frame encoding is achieved by applying simple binary feature tests which are stored in the nodes of an ensemble of randomized ferns. The concatenation of small block codes generated by each fern yields a global compact representation of camera frames. Based on those representations we define the frame dissimilarity as the block-wise hamming distance (BlockHD). Dissimilarities between an incoming query frame and a large set of keyframes can be efficiently evaluated by simply traversing the nodes of the ferns and counting image co-occurrences in corresponding code tables. In tracking mode, those dissimilarities decide whether a frame/pose pair is considered as a novel keyframe. For tracking recovery, poses of the most similar keyframes are retrieved and used for reinitialization of the tracking algorithm. The integration of our relocalization method into a hand-held KinectFusion system allows seamless continuation of mapping even when tracking is frequently lost.

show abstract

“…Scene coordinate regression methods [44,17,49,7,31,32,6,12,33,8] also estimate 2D-3D correspondences between image and environment but do so densely for each pixel of the input image. This circumvents the need for a feature detector with the aforementioned draw-backs of feature-based methods.…”

Section: Related Workmentioning

confidence: 99%

Expert Sample Consensus Applied to Camera Re-Localization

Brachmann

Rother

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

113

View full text Add to dashboard Cite

Fitting model parameters to a set of noisy data points is a common problem in computer vision. In this work, we fit the 6D camera pose to a set of noisy correspondences between the 2D input image and a known 3D environment. We estimate these correspondences from the image using a neural network. Since the correspondences often contain outliers, we utilize a robust estimator such as Random Sample Consensus (RANSAC) or Differentiable RANSAC (DSAC) to fit the pose parameters. When the problem domain, e.g. the space of all 2D-3D correspondences, is large or ambiguous, a single network does not cover the domain well. Mixture of Experts (MoE) is a popular strategy to divide a problem domain among an ensemble of specialized networks, so called experts, where a gating network decides which expert is responsible for a given input. In this work, we introduce Expert Sample Consensus (ESAC), which integrates DSAC in a MoE. Our main technical contribution is an efficient method to train ESAC jointly and end-to-end. We demonstrate experimentally that ESAC handles two real-world problems better than competing methods, i.e. scalability and ambiguity. We apply ESAC to fitting simple geometric models to synthetic images, and to camera re-localization for difficult, real datasets.

show abstract

Multi-output Learning for Camera Relocalization

Cited by 97 publications

References 14 publications

Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade

Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade

Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding

Expert Sample Consensus Applied to Camera Re-Localization

Contact Info

Product

Resources

About