2021
DOI: 10.48550/arxiv.2107.10844
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DOVE: Learning Deformable 3D Objects by Watching Videos

Abstract: Learning deformable 3D objects from 2D images is an extremely ill-posed problem. Existing methods rely on explicit supervision to establish multi-view correspondences, such as template shape models and keypoint annotations, which restricts their applicability on objects "in the wild". In this paper, we propose to use monocular videos, which naturally provide correspondences across time, allowing us to learn 3D shapes of deformable object categories without explicit keypoints or template shapes. Specifically, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 58 publications
0
10
0
Order By: Relevance
“…Kanazawa et al [7] learn to regress 3D bird shape, given keypoints and silhouettes; birds exhibit rather limited articulation. Recent work obviates the need for 2D keypoints [6,22,26]. Model-based 3D Reconstruction.…”
Section: Related Workmentioning
confidence: 99%
“…Kanazawa et al [7] learn to regress 3D bird shape, given keypoints and silhouettes; birds exhibit rather limited articulation. Recent work obviates the need for 2D keypoints [6,22,26]. Model-based 3D Reconstruction.…”
Section: Related Workmentioning
confidence: 99%
“…Templates are built for various objects including human body [34,1], hand [53], face [29,3], and animals [85,2]. Templatefree methods [19,30,25,64,57,70,23,44,69] study the problem of predicting nonrigid objects directly for a specific category. However, these methods heavily rely on strong category priors such as key-points annotations [19], canonical shapes [10], temporal consistency constraints [30], or canonical surface mapping [23], making them hard to generalize to category-agnostic setting.…”
Section: Related Workmentioning
confidence: 99%
“…Recent works [74-76, 70, 45] demonstrate promising results. However, they often impose non-realistic assumptions on articulation, such as control point driven deformation [74][75][76]70] or freeform deformation [45,50]. As a result, they fall short of the goal of modeling skeletal characters that can be realistically re-animated in downstream applications.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Category reconstruction from image/video collections. A number of recent methods build deformable 3D models of object categories from images or videos with weak 2D annotations, such as keypoints, object silhouettes, and optical flow, obtained from human annotators or predicted by off-the-shelf models [6,10,14,18,19,52,60]. Such methods often rely on a coarse shape template [16,47], and are not able to recover fine-grained details or large articulations.…”
Section: Related Workmentioning
confidence: 99%