Visual geolocalisation has remained as a challenge in the research community: Given a query image, and a geo-tagged reference database, the goal is to derive a location estimate for the query image. We propose an approach to tackling the geolocalisation problem in a four-step manner. Essentially, our approach focuses on re-ranking the candidate images after image retrieval, by considering the visual similarity of the candidate and its neighbouring images, to the query image. By introducing the neighbouring images, the visual information of a candidate location has been enriched. The evaluation has been conducted on three street view datasets, where our approach outperforms three baseline approaches, in terms of location estimation accuracy on two datasets. We provide discussions related to, firstly, whether using deep features for image retrieval helps improve location estimation accuracy, and the effectiveness of geographical neighbourhoods; secondly, using different deep architectures for feature extraction, and its impact on estimation accuracy; thirdly, investigating if our approach consistently outperforms the classic 1-NN approach, on two datasets with significant difference in visual elements.
INTRODUCTIONHumans are able to recognize places and infer locations from images: Provided with a photo of the Ruins of St. Paul's, some may instantly know it is a landmark in Macau, China. A more challenging example would be an image of coconut trees by the sea. In this case, we may have trouble deriving an accurate location estimate from this generic photo. However, we know this photo depicts "somewhere in the tropical areas". And this knowledge helps narrow down the list of candidate locations. Data association [1] and semantic reasoning both play an essential role in the human ability of visual place recognition [2]. Hays and Efros pointed out that semantic reasoning is a huge challenge computationally in 2008 [2]. Luckily, large image collections available on the Internet have paved the way for datadriven approaches to visual geolocalisation. Many research efforts have been devoted to solving this problem, for prospective applications in robot vision [3], landmark identification [4], as well as city reconstruction [5]. In this article, we confine ourselves to a more defined problem: Given a query image, can we derive a location estimate of the query,This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.