2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids) 2022
DOI: 10.1109/humanoids53995.2022.10000200
|View full text |Cite
|
Sign up to set email alerts
|

RoverNet: Vision-Based Adaptive Human-to-Robot Object Handovers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…The input data consists of a sequence of T video frames x 1:T = (x 1 , ..., x T ), force-torque measurements f t 1:T = (f t 1 , ..., f t T ) 6 and the state of the gripper g 1:T = (g 1 , ...g T ), where g t ∈ {-0.5, 0.0, 0.5} refers to {open, partially closed, and closed}, respectively. Additional annotations are the current human action h 1:T = (h 1 , ..., h T ) being performed, where h t ∈ {idle, approach, interact, retract, post-idle, not released, dropped} and which is only available during training, and the current robot action r 1:T = (r 1 , ..., r T ) being performed, where r t ∈ {approach, interact, retract} and which is available during training and inference time.…”
Section: Baseline Approachesmentioning
confidence: 99%
See 2 more Smart Citations
“…The input data consists of a sequence of T video frames x 1:T = (x 1 , ..., x T ), force-torque measurements f t 1:T = (f t 1 , ..., f t T ) 6 and the state of the gripper g 1:T = (g 1 , ...g T ), where g t ∈ {-0.5, 0.0, 0.5} refers to {open, partially closed, and closed}, respectively. Additional annotations are the current human action h 1:T = (h 1 , ..., h T ) being performed, where h t ∈ {idle, approach, interact, retract, post-idle, not released, dropped} and which is only available during training, and the current robot action r 1:T = (r 1 , ..., r T ) being performed, where r t ∈ {approach, interact, retract} and which is available during training and inference time.…”
Section: Baseline Approachesmentioning
confidence: 99%
“…3: 5 Although all annotations were performed by a single annotator, there might still be some ambiguity in the exact boundaries of the person's actions. 6 f t is a vector consisting of the force (fx, fy, fz) and torque (τx, τy, τz) in three directions expressed in Newton and Newton-meter, respectively.…”
Section: A Video Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, we propose a method to maximize the utilization of existing training data, consisting of input RGB videos and the corresponding labels. In our previous work, we developed an approach for generating object handover behaviors using recurrent neural networks [5], [6], where videos of the giver's motion are used as input to an LSTM network, which computes the necessary receiver's motion for a successful handover. The proposed network can predict either the handover location [6] or complete receiver trajectories [5].…”
Section: Introductionmentioning
confidence: 99%
“…In our previous work, we developed an approach for generating object handover behaviors using recurrent neural networks [5], [6], where videos of the giver's motion are used as input to an LSTM network, which computes the necessary receiver's motion for a successful handover. The proposed network can predict either the handover location [6] or complete receiver trajectories [5]. To enhance this approach, we introduce a semi-supervised recurrent neural network training technique that employs a Generative Adversarial Network (GAN) with LSTM layers.…”
Section: Introductionmentioning
confidence: 99%