Uncertainty-Aware Self-Supervised Learning of Spatial Perception Tasks

Nava, Mirko; Paolillo, Antonio; Guzzi, Jérôme; Gambardella, Luca Maria; Giusti, Alessandro

doi:10.1109/lra.2021.3095269

Cited by 15 publications

(15 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zeng et al [10] perform multi-view object segmentation to generate training data for object pose estimation. Nava et al [22] exploit noisy state estimates to assist self-supervised learning of spatial perception models.…”

Section: B Multi-view Object-based Perceptionmentioning

confidence: 99%

SLAM-Supported Self-Training for 6D Object Pose Estimation

Lü¹,

Zhang²,

Doherty³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent progress in learning-based object pose estimation paves the way for developing richer object-level world representations. However, the estimators, often trained with out-of-domain data, can suffer performance degradation as deployed in novel environments. To address the problem, we present a SLAM-supported self-training procedure to autonomously improve robot object pose estimation ability during navigation. Combining the network predictions with robot odometry, we can build a consistent object-level environment map via pose graph optimization (PGO). Exploiting the state estimates from PGO, we pseudo-label robot-collected RGB images to fine-tune the pose estimators. Unfortunately, it is difficult to quantify the uncertainty of the estimator predictions. The unmodeled data uncertainty used for PGO can result in low-quality object pose estimates. An automatic covariance tuning method is developed for robust PGO by allowing the measurement uncertainty models to change as part of the optimization process. The formulation permits a straightforward alternating minimization procedure that re-scales covariances analytically and component-wise, enabling more flexible noise modeling for learning-based measurements. We test our method with the deep object pose estimator (DOPE) on the YCB video dataset and in real-world robot experiments. The method can achieve significant performance gain in pose estimation, and in return facilitates the success of object SLAM. 1

show abstract

Section: B Multi-view Object-based Perceptionmentioning

confidence: 99%

SLAM-Supported Self-Training for 6D Object Pose Estimation

Lü¹,

Zhang²,

Doherty³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…the drone. Perception approaches for robotic arms focus on identifying and localizing an object to be grasped [1], [21], [22], [23]. For example, Pinto et al [1] predict the probability of success of a grasp attempt, considering the optimal angle for approaching the object.…”

Section: Related Workmentioning

confidence: 99%

“…With a similar camera configuration, Tobin et al [21] learn to estimate the location of known geometrical objects (e.g., pyramids, cones, cylinders, and cubes); Zeng et al [22] estimate the full 3D pose of objects from the feed of multiple fixed-inspace cameras. In contrast, Nava et al [23] learn the 3D pose of objects using only an uncalibrated monocular camera attached to the end-effector.…”

Section: Related Workmentioning

confidence: 99%

“…In contrast, our approach is formulated for the specific problem of learning to interpret high-dimensional data, achieving a spatial understanding of the scene, without the direct coupling of perception and action into a single model. In our robot arm use case (A2O), we build upon the work by Nava et al [23] for the problem formulation and baseline CNN architecture. In our two nano-drone use cases, whose goal is to estimate either a peer drone's (D2D) or a human's pose (D2H), we base our work upon PULP-Frontnet [2].…”

Section: Related Workmentioning

confidence: 99%

“…We use a custom MobileNetV2 [23] network, which we further extend with an MLP that pre-processes the robot's state before concatenating it with image features, as shown in Figure 2-A To collect our dataset, we use the Gazebo simulator, where we render multiple environments using a domain randomization technique [21]. Specifically, we randomize the pose of the object of interest; the scale, color, and pose of the decoy objects; the texture of the working surface, and scene lighting direction and intensity.…”

Section: B Robot Arm-to-object: A2omentioning

confidence: 99%

See 2 more Smart Citations

Vision-State Fusion: Improving Deep Neural Networks for Autonomous Robotics

Elia¹,

Bonato²,

Nava³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Vision-based perception tasks fulfill a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatics maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high precision surgery. Most control-oriented and egocentric perception problems are commonly solved by taking advantage of the robot state estimation as an auxiliary input, particularly when artificial intelligence comes into the picture. In this work, we propose to apply a similar approach for the first time -to the best of our knowledge -to allocentric perception tasks, where the target variables refer to an external subject. We prove how our general and intuitive methodology improves the regression performance of deep convolutional neural networks (CNNs) with ambiguous problems such as the allocentric 3D pose estimation. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R 2 metric up to +0.514 compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous pocket-sized UAV in the human pose estimation task. Our results show a significant reduction, i.e., 24% on average, on the mean absolute error of our stateful CNN. SUPPLEMENTARY MATERIALIn-field testing video: https://youtu.be/LX0seyXWQKI.

show abstract

Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics

Cereda,

Bonato,

Nava

et al. 2024

J Intell Robot Syst

Self Cite

View full text Add to dashboard Cite

Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot’s state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e., mediated approaches, the robot’s state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time – to the best of our knowledge – to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general methodology improves the regression performance of deep convolutional neural networks (CNNs) on a broad class of non-egocentric 3D pose estimation problems, with minimal computational cost. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R$$^{2}$$ 2 regression metric, up to +0.51, compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous cm-scale UAV on the human pose estimation task. Our results show a significant reduction, i.e., 24% on average, on the mean absolute error of our stateful CNN, compared to a State-of-the-Art stateless counterpart.

show abstract

Uncertainty-Aware Self-Supervised Learning of Spatial Perception Tasks

Cited by 15 publications

References 23 publications

SLAM-Supported Self-Training for 6D Object Pose Estimation

SLAM-Supported Self-Training for 6D Object Pose Estimation

Vision-State Fusion: Improving Deep Neural Networks for Autonomous Robotics

Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics

Contact Info

Product

Resources

About