RoverNet: Vision-Based Adaptive Human-to-Robot Object Handovers

Mavsar, Matija; Ude, Aleš

doi:10.1109/humanoids53995.2022.10000200

Cited by 4 publications

(5 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The input data consists of a sequence of T video frames x 1:T = (x 1 , ..., x T ), force-torque measurements f t 1:T = (f t 1 , ..., f t T ) 6 and the state of the gripper g 1:T = (g 1 , ...g T ), where g t ∈ {-0.5, 0.0, 0.5} refers to {open, partially closed, and closed}, respectively. Additional annotations are the current human action h 1:T = (h 1 , ..., h T ) being performed, where h t ∈ {idle, approach, interact, retract, post-idle, not released, dropped} and which is only available during training, and the current robot action r 1:T = (r 1 , ..., r T ) being performed, where r t ∈ {approach, interact, retract} and which is available during training and inference time.…”

Section: Baseline Approachesmentioning

confidence: 99%

“…3: 5 Although all annotations were performed by a single annotator, there might still be some ambiguity in the exact boundaries of the person's actions. 6 f t is a vector consisting of the force (fx, fy, fz) and torque (τx, τy, τz) in three directions expressed in Newton and Newton-meter, respectively.…”

Section: A Video Classificationmentioning

confidence: 99%

“…Current approaches to error handling and failure detection in object handovers focus on the approach and transfer phase of the interaction, such as detecting slipping objects [1], increasing robustness to unwanted disturbances [2] and adapting to perception uncertainties [3]. Other approaches include using tactile and force-sensing to reliably release the object when pulled [4], reactive handovers for unknown objects [5], adapting to the human's motion during the handover [6],…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Using Visual Anomaly Detection for Task Execution Monitoring

Thoduka

Gall

Plöger

2021

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

An object handover between a robot and a human is a coordinated action which is prone to failure for reasons such as miscommunication, incorrect actions and unexpected object properties. Existing works on handover failure detection and prevention focus on preventing failures due to object slip or external disturbances. However, there is a lack of datasets and evaluation methods that consider unpreventable failures caused by the human participant. To address this deficit, we present the multimodal Handover Failure Detection dataset, which consists of failures induced by the human participant, such as ignoring the robot or not releasing the object. We also present two baseline methods for handover failure detection: (i) a video classification method using 3D CNNs and (ii) a temporal action segmentation approach which jointly classifies the human action, robot action and overall outcome of the action. The results show that video is an important modality, but using force-torque data and gripper position help improve failure detection and action segmentation accuracy.

show abstract

Section: Baseline Approachesmentioning

confidence: 99%

Section: A Video Classificationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using Visual Anomaly Detection for Task Execution Monitoring

Thoduka

Gall

Plöger

2021

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

show abstract

“…In this paper, we propose a method to maximize the utilization of existing training data, consisting of input RGB videos and the corresponding labels. In our previous work, we developed an approach for generating object handover behaviors using recurrent neural networks [5], [6], where videos of the giver's motion are used as input to an LSTM network, which computes the necessary receiver's motion for a successful handover. The proposed network can predict either the handover location [6] or complete receiver trajectories [5].…”

Section: Introductionmentioning

confidence: 99%

“…In our previous work, we developed an approach for generating object handover behaviors using recurrent neural networks [5], [6], where videos of the giver's motion are used as input to an LSTM network, which computes the necessary receiver's motion for a successful handover. The proposed network can predict either the handover location [6] or complete receiver trajectories [5]. To enhance this approach, we introduce a semi-supervised recurrent neural network training technique that employs a Generative Adversarial Network (GAN) with LSTM layers.…”

Section: Introductionmentioning

confidence: 99%

GAN-Based Semi-Supervised Training of LSTM Nets for Intention Recognition in Cooperative Tasks

Mavsar,

Morimoto,

Ude

2024

IEEE Robot. Autom. Lett.

Self Cite

View full text Add to dashboard Cite

The accumulation of a sufficient amount of data for training deep neural networks is a major hindrance in the application of deep learning in robotics. Acquiring real-world data requires considerable time and effort, yet it might still not capture the full range of potential environmental variations. The generation of new synthetic data based on existing training data has been enabled with the development of generative adversarial networks (GANs). In this paper, we introduce a training methodology based on GANs that utilizes a recurrent, LSTM-based architecture for intention recognition in robotics. The resulting networks predict the intention of the observed human or robot based on input RGB videos. They are trained in a semi-supervised manner, with the output classification networks predicting one of possible labels for the observed motion, while the recurrent generator networks produce fake RGB videos that are leveraged in the training process. We show that utilization of the generated data during the network training process increases the accuracy and generality of motion classification compared to using only real training data. The proposed method can be applied to a variety of dynamic tasks and different LSTM-based classification networks to supplement real data.

show abstract