Image-similarity-based Convolutional Neural Network for Robot Visual Relocalization

Wang, Li; Li, Ruifeng; Sun, Jiangtao; Seah, Hock Soon; Quah, Chee Kwang; Zhao, Lingjuan; Tandianus, Budianto

doi:10.18494/sam.2020.2549

Cited by 4 publications

(8 citation statements)

References 33 publications

(34 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, such methods are not applicable for the visual localization of large-scale scenes and are associated with low accuracies [ 21 ]. Although some methods [ 22 , 23 ] have made improvements and the accuracy has been greatly improved, these kinds of method all need to use camera pose data.…”

Section: Related Workmentioning

confidence: 99%

A Visual and VAE Based Hierarchical Indoor Localization Method

Jiang

Zou

Chen

et al. 2021

Sensors

View full text Add to dashboard Cite

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

show abstract

Section: Related Workmentioning

confidence: 99%

A Visual and VAE Based Hierarchical Indoor Localization Method

Jiang

Zou

Chen

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…In unmanned robot and self-driving car applications, (1)(2)(3)(4) labeling data is one of the most time-consuming tasks. For instance, when daytime and nighttime data are required, data must be collected during different periods and then staff must be assigned to label the data.…”

Section: Introductionmentioning

confidence: 99%

Image-to-image Translation via Contour-consistency Networks

Wang¹,

Lin²,

Hsia³

et al. 2022

Sensors and Materials

View full text Add to dashboard Cite

In this paper, a novel framework for image-to-image translation, in which contourconsistency networks are used to solve the problem of inconsistency between the contours of generated and original images, is proposed. The objective of this study was to address the lack of an adequate training set. At the generator end, the original map is sampled by an encoder to obtain the encoder feature map; the attention feature map is then obtained using the attention module. Using the attention feature map, the decoder can ascertain where more conversions are required. The mechanism at the discriminator end is similar to that at the generator end. The map is sampled through an encoder to obtain the encoder feature map and then converted into the attention feature map. Finally, the map is classified by the classifier as real or fake. Experimental results demonstrate the effectiveness of the proposed method.

show abstract

“…Since the camera is often fixed on a robot, a main component of vision-based robot relocalization is visual pose estimation in the world coordinate system. Thus, it can be divided into four main types of relocalization methods: measurement-based methods, keyframe-based methods, feature-based methods and learning-based methods [3].…”

Section: Introductionmentioning

confidence: 99%

“…For instance, ref. [3] found that pose precision is low when there is a large visual dissimilarity between the testing image and the training set. The authors in [3] presented a image cropping algorithm based on a genetic algorithm to select the most similar image within the training set.…”

Section: Introductionmentioning

confidence: 99%

“…Then, it was found in [3] that if the trajectory of the training set and the trajectory of the testing set are visually similar, the relocalization performance will be better on the testing set. However, the method in [3] has a limitation: the cropped positions of the image cropping algorithm based on genetic algorithm are fixed. Thus, the cropped image may not be the one that is most similar to the training set.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy

Xie

et al. 2020

Sensors

Self Cite

View full text Add to dashboard Cite

The traditional CNN for 6D robot relocalization which outputs pose estimations does not interpret whether the model is making sensible predictions or just guessing at random. We found that convnet representations trained on classification problems generalize well to other tasks. Thus, we propose a multi-task CNN for robot relocalization, which can simultaneously perform pose regression and scene recognition. Scene recognition determines whether the input image belongs to the current scene in which the robot is located, not only reducing the error of relocalization but also making us understand with what confidence we can trust the prediction. Meanwhile, we found that when there is a large visual difference between testing images and training images, the pose precision becomes low. Based on this, we present the dual-level image-similarity strategy (DLISS), which consists of two levels: initial level and iteration-level. The initial level performs feature vector clustering in the training set and feature vector acquisition in testing images. The iteration level, namely, the PSO-based image-block selection algorithm, can select the testing images which are the most similar to training images based on the initial level, enabling us to gain higher pose accuracy in testing set. Our method considers both the accuracy and the robustness of relocalization, and it can operate indoors and outdoors in real time, taking at most 27 ms per frame to compute. Finally, we used the Microsoft 7Scenes dataset and the Cambridge Landmarks dataset to evaluate our method. It can obtain approximately 0.33 m and 7.51∘ accuracy on 7Scenes dataset, and get approximately 1.44 m and 4.83∘ accuracy on the Cambridge Landmarks dataset. Compared with PoseNet, our CNN reduced the average positional error by 25% and the average angular error by 27.79% on 7Scenes dataset, and reduced the average positional error by 40% and the average angular error by 28.55% on the Cambridge Landmarks dataset. We show that our multi-task CNN can localize from high-level features and is robust to images which are not in the current scene. Furthermore, we show that our multi-task CNN gets higher accuracy of relocalization by using testing images obtained by DLISS.

show abstract

Image-similarity-based Convolutional Neural Network for Robot Visual Relocalization

Cited by 4 publications

References 33 publications

A Visual and VAE Based Hierarchical Indoor Localization Method

A Visual and VAE Based Hierarchical Indoor Localization Method

Image-to-image Translation via Contour-consistency Networks

Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy

Contact Info

Product

Resources

About