Benchmarking Image Retrieval for Visual Localization

Pion, Noé; Humenberger, Martin; Csurka, Gabriela; Cabon, Yohann; Sattler, Torsten

doi:10.1109/3dv50981.2020.00058

Cited by 60 publications

(34 citation statements)

References 102 publications

(268 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These methods are trained with a classification loss, which prevents us from training them for ground image retrieval. However, they have been found to have good generalization capabilities [23], as they outperform other methods for visual localization tasks without being trained on the evaluation dataset. Accordingly, we employ these methods using pre-trained weights 1 .…”

Section: Discussionmentioning

confidence: 99%

Deep Metric Learning for Ground Images

Radhakrishnan,

Schmid,

Scholz

et al. 2021

Preprint

View full text Add to dashboard Cite

Ground texture based localization methods are potential prospects for low-cost, highaccuracy self-localization solutions for robots. These methods estimate the pose of a given query image, i.e. the current observation of the ground from a downward-facing camera, in respect to a set of reference images whose poses are known in the application area. In this work, we deal with the initial localization task, in which we have no prior knowledge about the current robot positioning. In this situation, the localization method would have to consider all available reference images. However, in order to reduce computational effort and the risk of receiving a wrong result, we would like to consider only those reference images that are actually overlapping with the query image. For this purpose, we propose a deep metric learning approach that retrieves the most similar reference images to the query image. In contrast to existing approaches to image retrieval for ground images, our approach achieves significantly better recall performance and improves the localization performance of a state-of-the-art ground texture based localization method.

show abstract

Section: Discussionmentioning

confidence: 99%

Deep Metric Learning for Ground Images

Radhakrishnan,

Schmid,

Scholz

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Therefore, they are often used as the first step to hierarchical localisation [18,47] and relative pose regression [11,46]. A recent review of retrieval based localisation can be found in [38].…”

Section: Visual Localisationmentioning

confidence: 99%

“…Despite the initial success, pose regression methods have been on the back foot since the study by Sattler et al [53] showed that the performance of pose regression methods is closer to less accurate image retrieval [38] than to 3D structure-based methods [50]. This is due to the fact that learning-based methods do not extrapolate well beyond the poses they encounter in training.…”

Section: Introductionmentioning

confidence: 99%

Reassessing the Limitations of CNN Methods for Camera Pose Regression

Ng,

Lopez-Rodriguez,

Balntas

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we address the problem of camera pose estimation in outdoor and indoor scenarios. In comparison to the currently top-performing methods that rely on 2D to 3D matching, we propose a model that can directly regress the camera pose from images with significantly higher accuracy than existing methods of the same class. We first analyse why regression methods are still behind the state-of-the-art, and we bridge the performance gap with our new approach. Specifically, we propose a way to overcome the biased training data by a novel training technique, which generates poses guided by a probability distribution from the training set for synthesising new training views. Lastly, we evaluate our approach on two widely used benchmarks and show that it achieves significantly improved performance compared to prior regression-based methods, retrieval techniques as well as 3D pipelines with local feature matching.

show abstract

“…Feature extraction refers to the process of computing one or multiple descriptors per image. Deep convolutional neural networks (CNNs) are now commonly used as feature extractors, yielding results superior to hand-crafted features [1,17] due to their ability to adapt to a given task and their high expressivity. Using a backbone CNN, feature extraction consists of passing an image through a series of layers to get a tensor of activations (a 3D volume of high-level information) which is processed to extract local or global descriptors (or both [18]).…”

Section: Retrieval Frameworkmentioning

confidence: 99%

“…In this work, we approach the problem of interconnecting cultural heritage image data from a purely 2D, content-based point-of-view. This image-based approach can serve as an entry point before engaging further towards complex modelization: 3D models rely on image localization and pose estimation [1]; 4D models (including time) need multiple views through time [2]; 5D models (time and scale) additionally make use of varying level of details available through various sources to build advanced representations [3]. Gathering and interconnecting image data are an essential starting step towards a better understanding of our cultural heritage, be it through dating content by reasoning [4], following the evolution of an area [5], reconstructing lost monuments [6], or visualization in a spatialized environment [7].…”

Section: Introductionmentioning

confidence: 99%

Connecting Images through Sources: Exploring Low-Data, Heterogeneous Instance Retrieval

2021

View full text Add to dashboard Cite

Along with a new volume of images containing valuable information about our past, the digitization of historical territorial imagery has brought the challenge of understanding and interconnecting collections with unique or rare representation characteristics, and sparse metadata. Content-based image retrieval offers a promising solution in this context, by building links in the data without relying on human supervision. However, while the latest propositions in deep learning have shown impressive results in applications linked to feature learning, they often rely on the hypothesis that there exists a training dataset matching the use case. Increasing generalization and robustness to variations remains an open challenge, poorly understood in the context of real-world applications. Introducing the alegoria benchmark, containing multi-date vertical and oblique aerial digitized photography mixed with more modern street-level pictures, we formulate the problem of low-data, heterogeneous image retrieval, and propose associated evaluation setups and measures. We propose a review of ideas and methods to tackle this problem, extensively compare state-of-the-art descriptors and propose a new multi-descriptor diffusion method to exploit their comparative strengths. Our experiments highlight the benefits of combining descriptors and the compromise between absolute and cross-domain performance.

show abstract

Benchmarking Image Retrieval for Visual Localization

Cited by 60 publications

References 102 publications

Deep Metric Learning for Ground Images

Deep Metric Learning for Ground Images

Reassessing the Limitations of CNN Methods for Camera Pose Regression

Connecting Images through Sources: Exploring Low-Data, Heterogeneous Instance Retrieval

Contact Info

Product

Resources

About