2020 International Conference on 3D Vision (3DV) 2020
DOI: 10.1109/3dv50981.2020.00058
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking Image Retrieval for Visual Localization

Abstract: Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two tasks: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for these tasks. These algorithms are often trained f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
33
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(34 citation statements)
references
References 102 publications
(268 reference statements)
1
33
0
Order By: Relevance
“…These methods are trained with a classification loss, which prevents us from training them for ground image retrieval. However, they have been found to have good generalization capabilities [23], as they outperform other methods for visual localization tasks without being trained on the evaluation dataset. Accordingly, we employ these methods using pre-trained weights 1 .…”
Section: Discussionmentioning
confidence: 99%
“…These methods are trained with a classification loss, which prevents us from training them for ground image retrieval. However, they have been found to have good generalization capabilities [23], as they outperform other methods for visual localization tasks without being trained on the evaluation dataset. Accordingly, we employ these methods using pre-trained weights 1 .…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, they are often used as the first step to hierarchical localisation [18,47] and relative pose regression [11,46]. A recent review of retrieval based localisation can be found in [38].…”
Section: Visual Localisationmentioning
confidence: 99%
“…Despite the initial success, pose regression methods have been on the back foot since the study by Sattler et al [53] showed that the performance of pose regression methods is closer to less accurate image retrieval [38] than to 3D structure-based methods [50]. This is due to the fact that learning-based methods do not extrapolate well beyond the poses they encounter in training.…”
Section: Introductionmentioning
confidence: 99%
“…Feature extraction refers to the process of computing one or multiple descriptors per image. Deep convolutional neural networks (CNNs) are now commonly used as feature extractors, yielding results superior to hand-crafted features [1,17] due to their ability to adapt to a given task and their high expressivity. Using a backbone CNN, feature extraction consists of passing an image through a series of layers to get a tensor of activations (a 3D volume of high-level information) which is processed to extract local or global descriptors (or both [18]).…”
Section: Retrieval Frameworkmentioning
confidence: 99%
“…In this work, we approach the problem of interconnecting cultural heritage image data from a purely 2D, content-based point-of-view. This image-based approach can serve as an entry point before engaging further towards complex modelization: 3D models rely on image localization and pose estimation [1]; 4D models (including time) need multiple views through time [2]; 5D models (time and scale) additionally make use of varying level of details available through various sources to build advanced representations [3]. Gathering and interconnecting image data are an essential starting step towards a better understanding of our cultural heritage, be it through dating content by reasoning [4], following the evolution of an area [5], reconstructing lost monuments [6], or visualization in a spatialized environment [7].…”
Section: Introductionmentioning
confidence: 99%