ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis

Li, Kailin; Yang, Lei; Zhan, Xinyu; Lv, Jun; Xu, Wenqiang; Li, Jiefeng; Lu, Cewu

doi:10.48550/arxiv.2109.05488

Cited by 7 publications

(16 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most previous works tackle 3D hand pose estimation [17,25,40,50,47] and object pose estimation [27,31,44,49] separately. Recently joint hand-object pose estimation has received more focus [14,26,28,12,8,13,11] due to the strong correlation when hands interact with objects. For learning-based methods, Hasson et al [14] propose attraction and repulsion losses to penalize physically implau-sible reconstructions.…”

Section: Hand-object Pose Estimationmentioning

confidence: 99%

“…Hasson et al [12] extend to video inputs by leveraging photometric and temporal consistency on sparsely annotated data. To tackle the lack of 3D ground truth, Kailin et al [26] introduce an online synthesis and exploration module to generate synthetic handobject poses from a predefined set of plausible grasps during training. In contrast to the above works, optimization-based methods [13,48,10] formulate the task by firstly estimating initial hand and object poses in isolation, then jointly refining them with contact constraints.…”

Section: Hand-object Pose Estimationmentioning

confidence: 99%

“…L is the quantization level. D is the depth radius 1 relative to the wrist joint estimated from the training data, and r z is the wrist joint depth 2 , which is assumed to be known [26,40] to resolve the scale ambiguity in the single view input. Given the camera intrinsic K, pixel coordinates, and depth, we can easily recover the 3D vertex's Euclidean coordinates in the camera space.…”

Section: Hand Pose Estimationmentioning

confidence: 99%

“…While the former methods [48,13,10] generalize to diverse object classes, the optimization process requires multiple iterations to converge, which is not applicable for real-time applications like XR. In contrast, learning-based methods [26,14,12,8,11] can achieve real-time inference. Motivated by the optimization-based methods, soft contact losses are introduced [14,12] to implicitly guide the network to pursuit plausible hand-object interaction.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Interacting Hand-Object Pose Estimation via Dense Mutual Attention

Wang¹,

Mao²,

Li³

2022

Preprint

View full text Add to dashboard Cite

3D hand-object pose estimation is the key to the success of many computer vision applications. The main focus of this task is to effectively model the interaction between the hand and an object. To this end, existing works either rely on interaction constraints in a computationally-expensive iterative optimization, or consider only a sparse correlation between sampled hand and object keypoints. In contrast, we propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object. Specifically, we first construct the hand and object graphs according to their mesh structures. For each hand node, we aggregate features from every object node by the learned attention and vice versa for each object node. Thanks to such dense mutual attention, our method is able to produce physically plausible poses with high quality and real-time inference speed. Extensive quantitative and qualitative experiments on large benchmark datasets show that our method outperforms state-of-the-art methods. The code is available at https://github.com/ rongakowang/DenseMutualAttention.git.

show abstract

Section: Hand-object Pose Estimationmentioning

confidence: 99%

Section: Hand-object Pose Estimationmentioning

confidence: 99%

Section: Hand Pose Estimationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Interacting Hand-Object Pose Estimation via Dense Mutual Attention

Wang¹,

Mao²,

Li³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…A key problem in computer vision is to understand how humans interact with their surroundings. Because hands are our primary means of manipulation with the physical world, there has been an intense interest in hand-object pose estimation [5, 14-16, 19, 39, 40] and the synthesis of static grasps for a given object [19,21,25,39]. However, human grasping is not limited to a single time instance, but involves a continuous interaction with objects in order to move them.…”

Section: Introductionmentioning

confidence: 99%

D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions

Christen¹,

Kocabas²,

Aksan³

et al. 2021

Preprint

View full text Add to dashboard Cite

We introduce the dynamic grasp synthesis task: given an object with a known 6D pose and a grasp reference, our goal is to generate motions that move the object to a target 6D pose. This is challenging, because it requires reasoning about the complex articulation of the human hand and the intricate physical interaction with the object. We propose a novel method that frames this problem in the reinforcement learning framework and leverages a physics simulation, both to learn and to evaluate such dynamic interactions. A hierarchical approach decomposes the task into low-level grasping and high-level motion synthesis. It can be used to generate novel hand sequences that approach, grasp, and move an object to a desired location, while retaining human-likeness. We show that our approach leads to stable grasps and generates a wide range of motions. Furthermore, even imperfect labels can be corrected by our method to generate dynamic interaction sequences. Video is available at https://eth-ait.github.io/d-grasp/.

show abstract