2022
DOI: 10.1007/978-3-031-19784-0_11
|View full text |Cite
|
Sign up to set email alerts
|

KeypointNeRF: Generalizing Image-Based Volumetric Avatars Using Relative Spatial Encoding of Keypoints

Abstract: Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from sparse views. In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 54 publications
(15 citation statements)
references
References 77 publications
0
15
0
Order By: Relevance
“…The optimization process generally takes several hours on a modern GPU, which is time-consuming and costly to scale. Inspired by multi-view stereo matching [72], some methods [8,12,45,73,87,90,99] train a network on multi-view datasets to learn to infer radiance fields from input images. This enables them to quickly fine-tune neural representations to unseen scenes.…”
Section: Related Workmentioning
confidence: 99%
“…The optimization process generally takes several hours on a modern GPU, which is time-consuming and costly to scale. Inspired by multi-view stereo matching [72], some methods [8,12,45,73,87,90,99] train a network on multi-view datasets to learn to infer radiance fields from input images. This enables them to quickly fine-tune neural representations to unseen scenes.…”
Section: Related Workmentioning
confidence: 99%
“…However, in sparse-view settings, the pre-fitted predictions suffer from misalignment errors that consequently hurt the quality of the synthesized views. Mihajlovi et al [20] utilized 3D keypoints instead of body models to avoid parametric fitting errors. L-NeRF [30] introduced a time-synchronization step that accounts for the multi-view image de-synchronization by producing a per-view body model using predicted time offsets.…”
Section: Human Mesh Recoverymentioning
confidence: 99%
“…Comparison with generalizable NeRF methods. Generalizable human-based NeRF methods [3,12,20,44] operate only on scenes with single humans. We choose to compare against NHP [12] after adjusting it to work on multihuman scenes by using the segmentation masks to render a separate image for each individual in the scene.…”
Section: Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…Differentiable rendering based on NeRFs [Mildenhall et al 2020] has also been applied to learn 3D human representations from images. Both person-specific models [Liu et al 2021a;Peng et al 2021b;Weng et al 2022] and generalizable models across identities [Choi et al 2022;Gao et al 2022;Hu et al 2023;Kwon et al 2021;Mihajlovic et al 2022] have been proposed, but the training requires multi-view images or videos. They are difficult to collect at scale such that the collected data covers a sufficient span of clothing types and textures.…”
Section: D Human Reconstruction From a Single Imagementioning
confidence: 99%