Deep Sensor Fusion for Real-Time Odometry Estimation

Valente, Michelle; Joly, Cyril; Fortelle, Atnaud de La

doi:10.1109/iros40897.2019.8967803

Cited by 10 publications

(7 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When provided with data from multiple sensors, a common approach in the literature is to fuse readings coming from different sensors together [28], [29]. To explore this option, we train the Sensor Fusion (SF) on the task of regressing the drone position from both image and audio features, using only instances in T .…”

Section: Alternative Strategiesmentioning

confidence: 99%

“…Most of the strategies considered in this work employ a CNN architecture based on MobileNet-V2 [31], with a total of 1 million parameters, and a variable number of output neurons dependant on the chosen strategy (3 for strictly supervised approaches, 12 for the SaP model). The SF model utilizes the same convolutional architecture for the image branch, while a series of 4 feed-forward layers with ReLU non-linearities processes the audio information and 3 more feed-forward layers fuse the two streams, similarly to [29]. The AaP model implements an encoder-decoder CNN architecture, with a bottleneck of size 128 [1].…”

Section: E Neural Network Architectures and Trainingmentioning

confidence: 99%

See 1 more Smart Citation

Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision

Nava

Paolillo

Guzzi

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

We introduce an approach to train neural network models for visual object localization using a small training set, labeled with ground truth object positions and a large unlabeled one. We assume that the object to be localized emits sound, which is perceived by a microphone rigidly affixed to the camera. This information is used as the target of a cross-modal pretext task: predicting sound features from camera frames. By solving the pretext task, the model draws self-supervision from visual and audio data. The approach is well suited to robot learning: we instantiate it to localize a small quadrotor from 128 × 80 pixel images acquired by a ground robot. Experiments on a separate testing set show that introducing the auxiliary pretext task yields large performance improvements: the Mean Absolute Error (MAE) of the estimated image coordinates of the target is reduced from 7 to 4 pixels; the MAE of the estimated distance is reduced from 28 cm to 14 cm. A model that has access to labels for the entire training set yields an MAE of 2 pixels and 11 cm, respectively.

show abstract

Section: Alternative Strategiesmentioning

confidence: 99%

Section: E Neural Network Architectures and Trainingmentioning

confidence: 99%

Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision

Nava

Paolillo

Guzzi

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

show abstract

“…However, traditional methods need accurate models and careful calibration, so employing machine learning for sensor fusion in odometry has also become an open research topic. In [ 299 ], sequences of CNNs were used to extract features and determine pose from a camera and 2D LiDAR. Some learning-based methods, such as VINet and DeepVIO [ 300 ], demonstrate comparable or even better performance than traditional methods.…”

Section: Sensor Fusionmentioning

confidence: 99%

Sensors and Sensor Fusion Methodologies for Indoor Odometry: A Review

et al. 2022

View full text Add to dashboard Cite

Although Global Navigation Satellite Systems (GNSSs) generally provide adequate accuracy for outdoor localization, this is not the case for indoor environments, due to signal obstruction. Therefore, a self-contained localization scheme is beneficial under such circumstances. Modern sensors and algorithms endow moving robots with the capability to perceive their environment, and enable the deployment of novel localization schemes, such as odometry, or Simultaneous Localization and Mapping (SLAM). The former focuses on incremental localization, while the latter stores an interpretable map of the environment concurrently. In this context, this paper conducts a comprehensive review of sensor modalities, including Inertial Measurement Units (IMUs), Light Detection and Ranging (LiDAR), radio detection and ranging (radar), and cameras, as well as applications of polymers in these sensors, for indoor odometry. Furthermore, analysis and discussion of the algorithms and the fusion frameworks for pose estimation and odometry with these sensors are performed. Therefore, this paper straightens the pathway of indoor odometry from principle to application. Finally, some future prospects are discussed.

show abstract

“…This data is then sequentially fed to a 1D convolution layer. We use six 1D convolutional layers following the method in [41]. Each convolutional layer is followed by a ReLU activation.…”

Section: Camera Feature Extractionmentioning

confidence: 99%

Towards Interpretable Camera and LiDAR data fusion for Unmanned Autonomous Vehicles Localisation

Tibebu¹,

Silva²,

Artaud³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent deep learning frameworks draw a strong research interest in the application of ego-motion estimation as they demonstrate a superior result compared to geometric approaches. However, due to the lack of multimodal datasets, most of these studies primarily focused on a single sensor-based estimation. To overcome this challenge, we collect a unique multimodal dataset named LboroAV2, using multiple sensors including camera, Light Detecting And Ranging (LiDAR), ultrasound, e-compass and rotary encoder. We also propose an end-to-end deep learning architecture for fusion of RGB images and LiDAR laser scan data for odometry application. The proposed method contains a convolutional encoder, a compressed representation and a recurrent neural network. Besides feature extraction and outlier rejection, the convolutional encoder produces a compressed representation which is used to visualise the network's learning process and to pass useful sequential information. The recurrent neural network uses this compressed sequential data to learn the relation between consecutive time steps. We use the LboroAV2 and KITTI VO datasets to experiment and evaluate our results. In addition to visualising the network's learning process, our approach gives superior results compared to other similar methods. The code for the proposed architecture is released in GitHub and accessible publicly.

show abstract

Deep Sensor Fusion for Real-Time Odometry Estimation

Cited by 10 publications

References 27 publications

Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision

Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision

Sensors and Sensor Fusion Methodologies for Indoor Odometry: A Review

Towards Interpretable Camera and LiDAR data fusion for Unmanned Autonomous Vehicles Localisation

Contact Info

Product

Resources

About