AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction

Chen, Zerui; Hasson, Yana; Schmid, Cordelia; Laptev, Ivan

doi:10.1007/978-3-031-19769-7_14

Cited by 35 publications

(15 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DexYCB For the DexYCB with official split S0 test set, we compare our method with existing works (Chao et al, 2021, Chen et al, 2022b, Li et al, 2021, Lin et al, 2023, Spurr et al, 2020, Tse et al, 2022 that utilize monocular input as ours. Since Method MPJPE(↓) AUC J (↑) MKA(↓) FPS(↑) ECCV22-Chen et al (Chen et al, 2022b) 19.00 ---ECCV20-Spurr et al (Spurr et al, 2020) 17.34 0.698 --CVPR22-Tse et al (Tse et al, 2022) 16.05 0.722 --CVPR22-Li et al (Li et al, 2021) 12.80 ---CVPR23-Yu et al (Yu et al, 2023) 8.92 ---CVPR21-Chao et al (Chao et al, 2021) 6.83 0.864 --CVPR23-Lin et al (Lin et al, 2023) 5.47 ---CVPR23-H2ONet 5 DexYCB is a sequential dataset and does not provide ground-truth for vertices, we compute only hand joint accuracy and MKA. As shown in Tab.…”

Section: Quantitative Resultsmentioning

confidence: 99%

“…2, the proposed method demonstrates superior computational efficiency compared to other methods. Recent studies evaluated with DexYCB (Chen et al, 2022b, Li et al, 2021, Lin et al, 2023, Tse et al, 2022, Yu et al, 2023 all aimed at the simultaneous reconstruction of hands and objects, so real-time performance is not guaranteed. Among the existing studies, the most recent work, H2ONet shows a significant improvement in accuracy compared to previous works.…”

Section: Quantitative Resultsmentioning

confidence: 99%

“…Generative approaches regress the pose and shape coefficients of the parametric hand model, typically MANO (Romero et al, 2017a), as a differentiable layer in the network. Recent works (Cao et al, 2021, Chen et al, 2022b, Hasson et al, 2019b, 2020, Wang et al, 2020a propose the work with an autoencoder (Kingma and Welling, 2013), which combines an image feature encoder and a model parameter decoder. Additional supervision is often applied using the feature extracted in the intermediate step, such as segmentation map, projected 2D keypoints, etc (Baek et al, 2019, Boukhayma et al, 2019, Chen et al, 2021c, Lin et al, 2023, Zhang et al, 2019b, Zhou et al, 2020.…”

Section: D Hand Pose and Mesh Estimation From Rgbmentioning

confidence: 99%

“…information; they reconstruct a dense hand mesh for its usability in applications (Chen et al, 2022b, Hasson et al, 2019b, 2020, Kulon et al, 2020, Ren et al, 2023, Yu et al, 2023, Zuo et al, 2023. However, while the studies focus on accuracy in various situations, they often fail to guarantee real-time performance and temporal coherence, which is crucial for real-world applications.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Temporally Enhanced Graph Convolutional Network for Hand Tracking from an Egocentric Camera

Cho,

Ha,

Jeon

et al. 2024

Preprint

View full text Add to dashboard Cite

We propose a robust 3D hand tracking system in various hand action environments, including hand-object interaction, which utilizes a single color image and a previous pose prediction as input. We observe that existing methods deterministically exploit temporal information in motion space, failing to address realistic diverse hand motions. Also, prior methods paid less attention to efficiency as well as robust performance, i.e., the balance issues between time and accuracy. The Temporally Enhanced Graph Convolutional Network (TE-GCN) utilizes a 2-stage framework to encode temporal information adaptively. The system establishes balance by adopting an adaptive GCN, which effectively learns the spatial dependency between hand mesh vertices. Furthermore, the system leverages the previous prediction by estimating the relevance across image features through the attention mechanism. The proposed method achieves state-of-the-art balanced performance on challenging benchmarks and demonstrates robust results on various hand motions in real scenes. Moreover, the hand tracking system is integrated into a recent HMD with an off-loading framework, achieving a real-time framerate while maintaining high performance. Our study improves the usability of a high-performance hand-tracking method, which can be generalized to other algorithms and contributes to the usage of HMD in everyday life. Our code with the HMD project will be available at https://github.com/UVR-WJCHO/TEGCN_on_Hololens2

show abstract

Section: Quantitative Resultsmentioning

confidence: 99%

Section: Quantitative Resultsmentioning

confidence: 99%

Section: D Hand Pose and Mesh Estimation From Rgbmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Temporally Enhanced Graph Convolutional Network for Hand Tracking from an Egocentric Camera

Cho,

Ha,

Jeon

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…This advantage indicates that implicit functions can generalize to arbitrary hands. Several implicit hand models have been proposed, such as LISA (Corona et al 2022), AlignSDF (Chen et al 2022b), Im2Hands (Lee et al 2023), HandNeRF (Guo et al 2023), and Hand Avatar (Chen, Wang, and Shum 2023). However, compared with explicit models, the computational cost of implicit models is more expensive.…”

Section: Implicit Hand Modelsmentioning

confidence: 99%

Monocular 3D Hand Mesh Recovery via Dual Noise Estimation

Li,

Lin,

Huang

et al. 2024

AAAI

View full text Add to dashboard Cite

Current parametric models have made notable progress in 3D hand pose and shape estimation. However, due to the fixed hand topology and complex hand poses, current models are hard to generate meshes that are aligned with the image well. To tackle this issue, we introduce a dual noise estimation method in this paper. Given a single-view image as input, we first adopt a baseline parametric regressor to obtain the coarse hand meshes. We assume the mesh vertices and their image-plane projections are noisy, and can be associated in a unified probabilistic model. We then learn the distributions of noise to refine mesh vertices and their projections. The refined vertices are further utilized to refine camera parameters in a closed-form manner. Consequently, our method obtains well-aligned and high-quality 3D hand meshes. Extensive experiments on the large-scale Interhand2.6M dataset demonstrate that the proposed method not only improves the performance of its baseline by more than 10% but also achieves state-of-the-art performance. Project page: https://github.com/hanhuili/DNE4Hand.

show abstract

MLPHand: Real Time Multi-view 3D Hand Reconstruction via MLP Modeling

Yang,

Li,

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction

Cited by 35 publications

References 63 publications

Temporally Enhanced Graph Convolutional Network for Hand Tracking from an Egocentric Camera

Temporally Enhanced Graph Convolutional Network for Hand Tracking from an Egocentric Camera

Monocular 3D Hand Mesh Recovery via Dual Noise Estimation

MLPHand: Real Time Multi-view 3D Hand Reconstruction via MLP Modeling

Contact Info

Product

Resources

About