“…We define visual location estimation as the process of estimating the geographical coordinates of a scene based solely on the visual cues existing in the given image. One could think multiple variations of this task, depending on whether we restrict our input to a particular kind of scenes, e.g., landmarks [3,5,46], to be from a particular area [1,17,44], or on whether we are using different inputs, e.g., a sequence of images per scene [2,31,32], or aerial imagery [24,34,43]. In this study, we focus on global-scale location estimation from single images, which is the most challenging problem setting.…”