2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.253
|View full text |Cite
|
Sign up to set email alerts
|

SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis

Abstract: This paper proposes an end-to-end learning framework for multiview stereopsis. We term the network SurfaceNet. It takes a set of images and their corresponding camera parameters as input and directly infers the 3D model. The key advantage of the framework is that both photo-consistency as well geometric relations of the surface structure can be directly learned for the purpose of multiview stereopsis in an end-to-end fashion. SurfaceNet is a fully 3D convolutional network which is achieved by encoding the came… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
322
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 412 publications
(322 citation statements)
references
References 30 publications
0
322
0
Order By: Relevance
“…The simplest representation for 3D reconstruction from one or more images are 2.5D depth maps as they can be inferred using standard 2D convolutional neural networks [14,18,24,43]. Since depth maps are view-based, these methods require additional post-processing algorithms to fuse information from multiple viewpoints in order to capture the entire object geometry.…”
Section: D Reconstructionmentioning
confidence: 99%
See 1 more Smart Citation
“…The simplest representation for 3D reconstruction from one or more images are 2.5D depth maps as they can be inferred using standard 2D convolutional neural networks [14,18,24,43]. Since depth maps are view-based, these methods require additional post-processing algorithms to fuse information from multiple viewpoints in order to capture the entire object geometry.…”
Section: D Reconstructionmentioning
confidence: 99%
“…In the last decade, major breakthroughs in shape extraction were due to deep neural networks coupled with the abundance of visual data. Recent works focus on learning 3D reconstruction using 2.5D [14,16,24,43], volumetric [7,11,13,18,30,42], mesh [12,21] and point cloud [10,27] representations. However, none of the above are sufficiently parsimonious or interpretable to allow for higher-level 3D scene understanding as required by intelligent systems.…”
Section: Introductionmentioning
confidence: 99%
“…We generate the ground truth depth maps from the point cloud with the screened Poisson surface reconstruction method [15]. We choose scenes: 1,4,9,10,11,12,13,15,23,24,29,32,33,34,48,49,62,75,77,110,114,118 as the testing set and the other scenes as training set. The RGBD, SUN3D, MVS and Scenes11 datasets contain more than 30000 different scenes in total, which are very different from the DTU dataset.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Point cloud, acquired directly by off-the-shelf 3D scanners like Microsoft Kinect or estimated indirectly via stereo-matching algorithms [1], is a recently popular 3D visual signal representation for free viewpoint image rendering, and is investigated in industrial standards like MPEG 1 . Unlike 3D meshes, a point cloud is an unstructured list of 3D coordinates, and low-level processing tasks like compression [2][3][4] and denoising [5,6] are challenging.…”
Section: Introductionmentioning
confidence: 99%
“…In [12][13][14], surface interpolation is accomplished using the Moving Least This work was supported in part by NSERC Grant RGPIN-2016-04590. 1 https://mpeg.chiariglione.org/standards/mpeg-i/point-cloudcompression Squares (MLS) method. However, it is observed that these methods often over-smooth due to MLS interpolation.…”
Section: Introductionmentioning
confidence: 99%