ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

Dai, Angela; Chang, Angel X.; Savva, Manolis; Halber, Maciej; Funkhouser, Thomas; Nießner, Matthias

doi:10.48550/arxiv.1702.04405

Cited by 177 publications

(35 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RGB-D from depth sensors A large amount of RGB-D data from depth sensors has played a key role in driving recent research on single-image depth estimation [14,39,5,10,38]. But due to the limitations of depth sensors and the manual effort involved in data collection, these datasets lack the diversity needed for arbitrary real world scenes.…”

Section: Related Workmentioning

confidence: 99%

“…But due to the limitations of depth sensors and the manual effort involved in data collection, these datasets lack the diversity needed for arbitrary real world scenes. For example, KITTI [14] consists mainly of road scenes; NYU Depth [39], ScanNet [10] and Matterport3D [5] consist of only indoor scenes. Our work seeks to address this drawback by focusing on diverse images in the wild.…”

Section: Related Workmentioning

confidence: 99%

“…Despite significant recent progress [45,15,35,24,17,27,46,49,11,25,22,50,23,13,43,54,20,44], current systems still perform poorly on arbitrary images in the wild [6]. One major obstacle is the lack of diverse training data, as most existing RGB-D datasets were collected via depth sensors and are limited to rooms [39,10,5] and roads [14]. As shown by recent work [6], systems trained on such data are unable to generalize to diverse scenes in the real world.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Single-Image Depth From Videos Using Quality Assessment Networks

Chen

Qian

Deng

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect singleview depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depth estimation in the wild. Project website: https

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Single-Image Depth From Videos Using Quality Assessment Networks

Chen

Qian

Deng

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…Dataset. The virtual scene dataset is built upon the scene datasets from SUNCG [Song et al 2017] and ScanNet [Dai et al 2017], encompassing both human-modeled synthetic scenes (66 from SUNCG) and human-scanned real scenes (38 from ScanNet). The collection contains 104 scenes spanning 5 categories, including bedrooms (21), sitting rooms (24), kitchens (20), etc.…”

Section: System and Implementationmentioning

confidence: 99%

Object-aware guidance for autonomous scene reconstruction

et al. 2018

View full text Add to dashboard Cite

local scanning. First, an objectness-based segmentation method is introduced to extract semantic objects from the current scene surface via a multi-class graph cuts minimization. Then, an object of interest (OOI) is identified as the NBO which the robot aims to visit and scan. The robot then conducts fine scanning on the OOI with views determined by the NBV strategy. When the OOI is recognized as a full object, it can be replaced by its most similar 3D model in a shape database. The algorithm iterates until all of the objects are recognized and reconstructed in the scene. Various experiments and comparisons have shown the feasibility of our proposed approach.

show abstract

“…Datasets with depth and surface normals Prior works on estimating depth or surface normals have mostly used NYU Depth [28] , Make3D [27], KITTI [13], or ScanNet [8]. Although these datasets provide highly accurate depth, as pointed out by Chen et al [7] they are limited to specific types of scenes.…”

Section: Related Workmentioning

confidence: 99%

Surface Normals in the Wild

Chen

Xiang

Deng

2017

2017 IEEE International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

We study the problem of single-image depth estimation for images in the wild. We collect human annotated surface normals and use them to train a neural network that directly predicts pixel-wise depth. We propose two novel loss functions for training with surface normal annotations. Experiments on NYU Depth and our own dataset demonstrate that our approach can significantly improve the quality of depth estimation in the wild.

show abstract

ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

Cited by 177 publications

References 18 publications

Learning Single-Image Depth From Videos Using Quality Assessment Networks

Learning Single-Image Depth From Videos Using Quality Assessment Networks

Object-aware guidance for autonomous scene reconstruction

Surface Normals in the Wild

Contact Info

Product

Resources

About