2020
DOI: 10.1007/978-3-030-58571-6_25
|View full text |Cite
|
Sign up to set email alerts
|

Atlas: End-to-End 3D Scene Reconstruction from Posed Images

Abstract: We present an end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images. Traditional approaches to 3D reconstruction rely on an intermediate representation of depth maps prior to estimating a full 3D model of a scene. We hypothesize that a direct regression to 3D is more effective. A 2D CNN extracts features from each image independently which are then back-projected and accumulated into a voxel volume using the camera intr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
124
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 190 publications
(124 citation statements)
references
References 57 publications
0
124
0
Order By: Relevance
“…With the success of deep learning, a number of learning-based techniques are proposed to tackle the problem. While several methods learn to directly predict 3D geometry as grids [25,24], point clouds [7] and TSDF [35], per-view depth map estimation is still the top choice of most approaches [47,51,23,32,21,29,33,10,36] due to its robustness and flexibility. Most of those methods follow the spirit of conventional approaches [14,6] and train a cost volume based neural network.…”
Section: Related Workmentioning
confidence: 99%
“…With the success of deep learning, a number of learning-based techniques are proposed to tackle the problem. While several methods learn to directly predict 3D geometry as grids [25,24], point clouds [7] and TSDF [35], per-view depth map estimation is still the top choice of most approaches [47,51,23,32,21,29,33,10,36] due to its robustness and flexibility. Most of those methods follow the spirit of conventional approaches [14,6] and train a cost volume based neural network.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, neural implicit representations demonstrated promising results for object geometry representation [7, 18, 20, 28, 30-32, 36, 50, 54, 57, 58], scene completion [5,14,33], novel view synthesis [19,21,34,60] and also generative modelling [6,26,27,39]. A few recent papers [1,3,8,23,44] attempt to predict scene-level geometry with RGB-(D) inputs, but they all assume given camera poses. Another set of works [17,51,59] tackle the problem of camera pose optimization, but they need a rather long optimization process, which is not suitable for real-time applications.…”
Section: Related Workmentioning
confidence: 99%
“…Our method fuses input view features using a transformer. We compare to Atlas [28], which fuses features by averaging, and NeuralRecon [37], which fuses locally by averaging and globally by RNN. Our method produces a high level of detail, while also filling in holes due to occlusion and unobserved regions.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, a number of works have addressed this by posing RGB-only 3D reconstruction as the direct prediction of a truncated signed-distance function (TSDF), using deep learning to fill in unobserved regions via learned priors [28,37]. These methods extract image features using a convolutional neural network (CNN), accumulate them into space by backprojecting onto a 3D grid, and then predict the TSDF volume using a 3D CNN.…”
Section: Introductionmentioning
confidence: 99%