Visual simultaneous localization and mapping (SLAM) has attracted high attention over the past few years. In this paper, a comprehensive survey of the state-of-the-art feature-based visual SLAM approaches is presented. The reviewed approaches are classified based on the visual features observed in the environment. Visual features can be seen at different levels; low-level features like points and edges, middle-level features like planes and blobs, and high-level features like semantically labeled objects. One of the most critical research gaps regarding visual SLAM approaches concluded from this study is the lack of generality. Some approaches exhibit a very high level of maturity, in terms of accuracy and efficiency. Yet, they are tailored to very specific environments, like feature-rich and static environments. When operating in different environments, such approaches experience severe degradation in performance. In addition, due to software and hardware limitations, guaranteeing a robust visual SLAM approach is extremely challenging. Although semantics have been heavily exploited in visual SLAM, understanding of the scene by incorporating relationships between features is not yet fully explored. A detailed discussion of such research challenges is provided throughout the paper.
Neuromorphic vision is a bio-inspired technology 1 that has triggered a paradigm shift in the computer vision 2 community and is serving as a key enabler for a wide range of 3 applications. This technology has offered significant advantages, 4 including reduced power consumption, reduced processing needs, 5 and communication speedups. However, neuromorphic cameras 6 suffer from significant amounts of measurement noise. This 7 noise deteriorates the performance of neuromorphic event-based 8 perception and navigation algorithms. In this article, we propose 9 a novel noise filtration algorithm to eliminate events that do 10 not represent real log-intensity variations in the observed scene. 11 We employ a graph neural network (GNN)-driven transformer 12 algorithm, called GNN-Transformer, to classify every active event 13 pixel in the raw stream into real log-intensity variation or 14 noise. Within the GNN, a message-passing framework, referred 15 to as EventConv, is carried out to reflect the spatiotemporal 16 correlation among the events while preserving their asynchronous 17 nature. We also introduce the known-object ground-truth label-18 ing (KoGTL) approach for generating approximate ground-truth 19 labels of event streams under various illumination conditions. 20 KoGTL is used to generate labeled datasets, from experiments 21 recorded in challenging lighting conditions, including moon light. 22 These datasets are used to train and extensively test our proposed 23 algorithm. When tested on unseen datasets, the proposed algo-24 rithm outperforms state-of-the-art methods by at least 8.8% in 25 terms of filtration accuracy. Additional tests are also conducted 26 on publicly available datasets (ETH Zürich Color-DAVIS346 27 datasets) to demonstrate the generalization capabilities of the 28 proposed algorithm in the presence of illumination variations 29 and different motion dynamics. Compared to state-of-the-art 30 solutions, qualitative results verified the superior capability of 31 the proposed algorithm to eliminate noise while preserving 32 meaningful events in the scene.
The ability to decide if a solution to a pose-graph problem is globally optimal is of high significance for safety-critical applications. Converging to a local-minimum may result in severe estimation errors along the estimated trajectory. In this paper, we propose a graph neural network based on a novel implementation of a graph convolutional-like layer, called PoseConv, to perform classification of posegraphs as optimal or sub-optimal. The operation of PoseConv required incorporating a new node feature, referred to as cost, to hold the information that the nodes will communicate. A training and testing dataset was generated based on publicly available bench-marking pose-graphs. The neural classifier is then trained and extensively tested on several subsets of the pose-graph samples in the dataset. Testing results have proven the model's capability to perform classification with 92−98% accuracy, for the different partitions of the training and testing dataset. In addition, the model was able to generalize to previously unseen variants of pose-graphs in the training dataset. Our method trades a small amount of accuracy for a large improvement in processing time. This makes it faster than other existing methods by up-to three orders of magnitude, which could be of paramount importance when using computationally-limited robots overseen by human operators.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.