SRAL: Shared Representative Appearance Learning for Long-Term Visual Place Recognition

Han, Fei; Yang, Xue; Deng, Yiming; Rentschler, Mark E.; Yang, Dejun; Zhang, Hao

doi:10.1109/lra.2017.2662061

Cited by 45 publications

(41 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We demonstrate our approach on three benchmark datasets, which have been extensively tested in recent literature [10], [26], [27]. The datasets are Oxford RobotCar, Nordland, and Gardens Point Walking.…”

Section: Experimental Methodsmentioning

confidence: 99%

Filter Early, Match Late: Improving Network-Based Visual Place Recognition

Hausler

Jacobson

Milford

2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

CNNs have excelled at performing place recognition over time, particularly when the neural network is optimized for localization in the current environmental conditions. In this paper we investigate the concept of feature map filtering, where, rather than using all the activations within a convolutional tensor, only the most useful activations are used. Since specific feature maps encode different visual features, the objective is to remove feature maps that are detract from the ability to recognize a location across appearance changes.Our key innovation is to filter the feature maps in an early convolutional layer, but then continue to run the network and extract a feature vector using a later layer in the same network. By filtering early visual features and extracting a feature vector from a higher, more viewpoint invariant later layer, we demonstrate improved condition and viewpoint invariance. Our approach requires image pairs for training from the deployment environment, but we show that state-of-the-art performance can regularly be achieved with as little as a single training image pair. An exhaustive experimental analysis is performed to determine the full scope of causality between early layer filtering and late layer extraction. For validity, we use three datasets: Oxford RobotCar, Nordland, and Gardens Point, achieving overall superior performance to NetVLAD. The work provides a number of new avenues for exploring CNN optimizations, without full re-training.

show abstract

Section: Experimental Methodsmentioning

confidence: 99%

Filter Early, Match Late: Improving Network-Based Visual Place Recognition

Hausler

Jacobson

Milford

2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

show abstract

“…For quantitative evaluation and comparison, we use precision-recall curves as a metric following (Sunderhauf et al 2015;Zhang, Han, and Wang 2016), where high area under the curve means both high recall (relates to a low false positive rate) and high precision (relates to a low false negative rate). Inspired by the conclusion drawn by (Han et al 2017) that HOG features perform the best among other types of raw visual features, we extract HOG features from landmarks as the input to generate the integrated representation by the proposed method. As is shown in Fig.…”

Section: Study Of the Orthogonality Of The Solutions Of Our New Methodsmentioning

confidence: 99%

“…After obtaining the learned representation which integrates landmark and holistic information, we can calculate the matching scores by using cosine similarity between query image and each template image in the projected subspace (Naseer et al 2014;Han et al 2017), and then determine whether two locations are matched by comparing the score with a user-defined threshold. Compared with existing long-term place recognition methods that use either holistic information or semantic landmarks only, our new method is more advantageous since it learns an integrated representation that can capture both insights.…”

Section: Visual Place Recognition Via Integrated Image Representationsmentioning

confidence: 99%

“…From the perspective of features, these methods are based either on local (e.g., SIFT) or global (e.g., HOG or deep features) features; from the perspective of localization cues, these methods can be generally grouped into two categories, based on either holistic layouts or landmarks, respectively. The holistic layout of the environment is typically represented using global features (Han et al 2017;Wu and Rehg 2011) that are learned or manually constructed to encode long-term changes. Very recently, several techniques take advantages of semantic landmarks (e.g., traffic lights and buildings) within the environment as an intermediate representation to address long-term place recognition (Yuan, Chan, and Lee 2011;Sunderhauf et al 2015).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Visual Place Recognition via Robust ℓ2-Norm Distance Based Holism and Landmark Integration

Liu

Wang

Han

et al. 2019

AAAI

Self Cite

View full text Add to dashboard Cite

Visual place recognition is essential for large-scale simultaneous localization and mapping (SLAM). Long-term robot operations across different time of the days, months, and seasons introduce new challenges from significant environment appearance variations. In this paper, we propose a novel method to learn a location representation that can integrate the semantic landmarks of a place with its holistic representation. To promote the robustness of our new model against the drastic appearance variations due to long-term visual changes, we formulate our objective to use non-squared ℓ2-norm distances, which leads to a difficult optimization problem that minimizes the ratio of the ℓ2,1-norms of matrices. To solve our objective, we derive a new efficient iterative algorithm, whose convergence is rigorously guaranteed by theory. In addition, because our solution is strictly orthogonal, the learned location representations can have better place recognition capabilities. We evaluate the proposed method using two large-scale benchmark data sets, the CMU-VL and Nordland data sets. Experimental results have validated the effectiveness of our new method in long-term visual place recognition applications.

show abstract

“…Thus, most methods based on visual cues used global features, such as GIST (Latif et al 2014), HOG (Naseer et al 2014), and CNN (Sünderhauf et al 2015), to construct representations of the holistic scene in the robot view. Besides using a single type of features, several approaches integrated multiple types of features to encode places (Pronobis et al 2010;Han et al 2017).…”

Section: Representations For Loop Closure Detectionmentioning

confidence: 99%

Long-Term Loop Closure Detection through Visual-Spatial Information Preserving Multi-Order Graph Matching

Gao

Zhang

2020

AAAI

Self Cite

View full text Add to dashboard Cite

Loop closure detection is a fundamental problem for simultaneous localization and mapping (SLAM) in robotics. Most of the previous methods only consider one type of information, based on either visual appearances or spatial relationships of landmarks. In this paper, we introduce a novel visual-spatial information preserving multi-order graph matching approach for long-term loop closure detection. Our approach constructs a graph representation of a place from an input image to integrate visual-spatial information, including visual appearances of the landmarks and the background environment, as well as the second and third-order spatial relationships between two and three landmarks, respectively. Furthermore, we introduce a new formulation that formulates loop closure detection as a multi-order graph matching problem to compute a similarity score directly from the graph representations of the query and template images, instead of performing conventional vector-based image matching. We evaluate the proposed multi-order graph matching approach based on two public long-term loop closure detection benchmark datasets, including the St. Lucia and CMU-VL datasets. Experimental results have shown that our approach is effective for long-term loop closure detection and it outperforms the previous state-of-the-art methods.

show abstract

SRAL: Shared Representative Appearance Learning for Long-Term Visual Place Recognition

Cited by 45 publications

References 32 publications

Filter Early, Match Late: Improving Network-Based Visual Place Recognition

Filter Early, Match Late: Improving Network-Based Visual Place Recognition

Visual Place Recognition via Robust ℓ2-Norm Distance Based Holism and Landmark Integration

Long-Term Loop Closure Detection through Visual-Spatial Information Preserving Multi-Order Graph Matching

Contact Info

Product

Resources

About