“…Generative approaches regress the pose and shape coefficients of the parametric hand model, typically MANO (Romero et al, 2017a), as a differentiable layer in the network. Recent works (Cao et al, 2021, Chen et al, 2022b, Hasson et al, 2019b, 2020, Wang et al, 2020a propose the work with an autoencoder (Kingma and Welling, 2013), which combines an image feature encoder and a model parameter decoder. Additional supervision is often applied using the feature extracted in the intermediate step, such as segmentation map, projected 2D keypoints, etc (Baek et al, 2019, Boukhayma et al, 2019, Chen et al, 2021c, Lin et al, 2023, Zhang et al, 2019b, Zhou et al, 2020.…”