We propose to perform imitation learning for dexterous manipulation from human demonstration videos. We record human videos on manipulation tasks (1st row) and perform 3D hand-object pose estimations from the videos (2nd row) for constructing the demonstrations. We have a paired simulation system providing the same dexterous manipulation tasks for the multi-finger robot hand (3rd row), including relocate, pour, and place inside, which we can solve using imitation learning with the inferred demonstrations.
ing the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275 [29]) and articulated object pose benchmarks (SAPIEN [34], BMVC [18]) at the fastest FPS ∼ 12.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.