Empty Cities: A Dynamic-Object-Invariant Space for Visual SLAM

Bescos, Berta; Cadena, César; Neira, José

doi:10.1109/tro.2020.3031267

Cited by 28 publications

(14 citation statements)

References 57 publications

(139 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast to applying clustering algorithms to low-level features, high-level features facilitate the clustering of map points belonging to independent objects with different dynamics, as well as the potential for detecting dynamic objects in one shot [ 100 ]. For rigid objects, features with the same semantic label always have the same motion label.…”

Section: Using High-level Features In Dynamic Slammentioning

confidence: 99%

A survey: which features are required for dynamic visual simultaneous localization and mapping?

Zheng

2021

Vis. Comput. Ind. Biomed. Art

View full text Add to dashboard Cite

In recent years, simultaneous localization and mapping in dynamic environments (dynamic SLAM) has attracted significant attention from both academia and industry. Some pioneering work on this technique has expanded the potential of robotic applications. Compared to standard SLAM under the static world assumption, dynamic SLAM divides features into static and dynamic categories and leverages each type of feature properly. Therefore, dynamic SLAM can provide more robust localization for intelligent robots that operate in complex dynamic environments. Additionally, to meet the demands of some high-level tasks, dynamic SLAM can be integrated with multiple object tracking. This article presents a survey on dynamic SLAM from the perspective of feature choices. A discussion of the advantages and disadvantages of different visual features is provided in this article.

show abstract

Section: Using High-level Features In Dynamic Slammentioning

confidence: 99%

A survey: which features are required for dynamic visual simultaneous localization and mapping?

Zheng

2021

Vis. Comput. Ind. Biomed. Art

View full text Add to dashboard Cite

show abstract

“…The authors of Mask-SLAM and DynaSLAM evaluate proposal methods on their original dataset recorded in dynamic environments. Empty Cities [29] integrates dynamic object detection with a generative adversarial model to inpaint the dynamic objects and generate static scenes from images in dynamic environments.…”

Section: Vision-based Localization In Dynamic Environmentsmentioning

confidence: 99%

VIODE: A Simulated Dataset to Address the Challenges of Visual-Inertial Odometry in Dynamic Environments

Minoda,

Schilling,

Wüest

et al. 2021

Preprint

View full text Add to dashboard Cite

Dynamic environments such as urban areas are still challenging for popular visual-inertial odometry (VIO) algorithms. Existing datasets typically fail to capture the dynamic nature of these environments, therefore making it difficult to quantitatively evaluate the robustness of existing VIO methods. To address this issue, we propose three contributions: firstly, we provide the VIODE benchmark, a novel dataset recorded from a simulated UAV that navigates in challenging dynamic environments. The unique feature of the VIODE dataset is the systematic introduction of moving objects into the scenes. It includes three environments, each of which is available in four dynamic levels that progressively add moving objects. The dataset contains synchronized stereo images and IMU data, as well as ground-truth trajectories and instance segmentation masks. Secondly, we compare state-of-the-art VIO algorithms on the VIODE dataset and show that they display substantial performance degradation in highly dynamic scenes. Thirdly, we propose a simple extension for visual localization algorithms that relies on semantic information. Our results show that scene semantics are an effective way to mitigate the adverse effects of dynamic objects on VIO algorithms. Finally, we make the VIODE dataset publicly available at https://github.com/kminoda/VIODE.

show abstract

“…In this work, we share a similar line of thought with Berta et al [11], [14], but move one step forward to build a multi-modal dynamics-invariant perception space to improve feature matching in dynamic environments. This space is built by first designing a novel deep neural network architecture to reconstruct the static semantics (i.e., static semantic segmentation map) and static images from the dynamic images in a sequential manner.…”

Section: Introductionmentioning

confidence: 97%

“…The most related work is by Berta et al [11], who improve Pix2Pix [12] by performing conditioning on both the dynamic image and its dynamic mask under cGAN [13], to recover realistic static images. Recently, Berta et al [14] implement two more losses based on image steganalysis techniques and ORB features, respectively, to better recover reliable features.…”

Section: Introductionmentioning

confidence: 99%

“…On one hand, simply discarding dynamic contents as done in [8], [9] reduces the amount of available features and may cause failures in feature matching when the dynamic portions of the image tends to dominate the whole image in feature space. On the other hand, although those dynamicto-static image translation approaches [11], [14] are capable of generating visually realistic images, they easily introduce blur and artifacts, especially in areas associated with moving objects. Extracting features on such recovered static images will degrade feature matching to some extent.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Modal Visual Place Recognition in Dynamics-Invariant Perception Space

Wu¹,

Wang²,

Sun³

2021

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Visual place recognition is one of the essential and challenging problems in the fields of robotics. In this letter, we for the first time explore the use of multi-modal fusion of semantic and visual modalities in dynamics-invariant space to improve place recognition in dynamic environments. We achieve this by first designing a novel deep learning architecture to generate the static semantic segmentation and recover the static image directly from the corresponding dynamic image. We then innovatively leverage the spatial-pyramid-matching model to encode the static semantic segmentation into feature vectors. In parallel, the static image is encoded using the popular Bag-ofwords model. On the basis of the above multi-modal features, we finally measure the similarity between the query image and target landmark by the joint similarity of their semantic and visual codes. Extensive experiments demonstrate the effectiveness and robustness of the proposed approach for place recognition in dynamic environments.

show abstract

Empty Cities: A Dynamic-Object-Invariant Space for Visual SLAM

Cited by 28 publications

References 57 publications

A survey: which features are required for dynamic visual simultaneous localization and mapping?

A survey: which features are required for dynamic visual simultaneous localization and mapping?

VIODE: A Simulated Dataset to Address the Challenges of Visual-Inertial Odometry in Dynamic Environments

Multi-Modal Visual Place Recognition in Dynamics-Invariant Perception Space

Contact Info

Product

Resources

About