MVSCRF: Learning Multi-View Stereo With Conditional Random Fields

Xue, Youze; Chen, Jiansheng; Wan, Wei; Huang, Yiqing; Cheng, Yu; Li, Tianpeng; Bao, Jianmin

doi:10.1109/iccv.2019.00441

Cited by 91 publications

(51 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Earlier work in this area uses CNN's for two-view [121] and multi-view stereo [33]. Lately, the learning-based MVS rely on the construction of 3D cost volume and use the deep neural networks for regularization and depth regression [18,38,113,46,71,114,111]. As most of these approaches utilize 3D CNN for cost volume regularization -which in general is computationally expensive, the majority of the recent work is motivated to meet the computational requirement with it.…”

Section: Related Workmentioning

confidence: 99%

Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo

Kaya¹,

Kumar²,

Sarno³

et al. 2022

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

We present a modern solution to the multi-view photometric stereo problem (MVPS). Our work suitably exploits the image formation model in a MVPS experimental setup to recover the dense 3D reconstruction of an object from images. We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry. Contrary to the previous multi-staged framework to MVPS, where the position, isodepth contours, or orientation measurements are estimated independently and then fused later, our method is simple to implement and realize. Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network. We render the MVPS images by considering the object's surface normals for each 3D sample point along the viewing direction rather than explicitly using the density gradient in the volume space via 3D occupancy information. We optimize the proposed neural radiance field representation for the MVPS setup efficiently using a fully connected deep network to recover the 3D geometry of an object. Extensive evaluation on the DiLiGenT-MV benchmark dataset shows that our method performs better than the approaches that perform only PS or only multi-view stereo (MVS) and provides comparable results against the state-of-the-art multistage fusion methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo

Kaya¹,

Kumar²,

Sarno³

et al. 2022

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

show abstract

“…Yao et al [31] proposed to replace 3D CNNs with recurrent neural networks, which leads to improved memory efficiency. Xue et al [32] proposed MVSCRF, where multi-scale conditional random fields (MSCRFs) are adopted to constraint the smoothness of depth prediction explicitly. Instead of using voxel grids, in this paper we propose to use a point-based network for MVS tasks to take advantage of 3D geometry learning without being burdened by the inefficiency found in 3D CNN computation.…”

Section: Related Workmentioning

confidence: 99%

Visibility-Aware Point-Based Multi-View Stereo Network

Chen

Han

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

We introduce VA-Point-MVSNet, a novel visibility-aware point-based deep framework for multi-view stereo (MVS). Distinct from existing cost volume approaches, our method directly processes the target scene as point clouds. More specifically, our method predicts the depth in a coarse-to-fine manner. We first generate a coarse depth map, convert it into a point cloud and refine the point cloud iteratively by estimating the residual between the depth of the current iteration and that of the ground truth. Our network leverages 3D geometry priors and 2D texture information jointly and effectively by fusing them into a feature-augmented point cloud, and processes the point cloud to estimate the 3D flow for each point. This point-based architecture allows higher accuracy, more computational efficiency and more flexibility than cost-volume-based counterparts. Furthermore, our visibility-aware multi-view feature aggregation allows the network to aggregate multi-view appearance cues while taking into account occlusions. Experimental results show that our approach achieves a significant improvement in reconstruction quality compared with state-of-the-art methods on the DTU and the Tanks and Temples dataset. The code of VA-Point-MVSNet proposed in this work will be released at https://github.com/callmeray/PointMVSNet.

show abstract

“…Recently, learning-based MVS methods [6,7,12,23,24,30,31,[33][34][35][36][37][38][39] have shown superior performance over traditional counterparts on MVS benchmarks [14,19]. These learning-based methods make use of convolutional neural networks (CNNs) to infer a depth map for each view, and carry out a separate multi-view depth fusion process to reconstruct 3D point clouds.…”

Section: Introductionmentioning

confidence: 99%

“…The lack of global context usually leads to local ambiguities in untextured or texture-less regions, thus reducing the robustness of matching. Although some recent works [31,34] try to obtain large context using deformable convolution or multi-scale information aggregation, the solution of mining the global context in each view has not been explored yet for MVS. Besides, in previous methods, the feature of each view is extracted independently from other views.…”

Section: Introductionmentioning

confidence: 99%

Multi-View Stereo with Transformer

Zhu¹,

Peng²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper proposes a network, referred to as MVSTR, for Multi-View Stereo (MVS). It is built upon Transformer and is capable of extracting dense features with global context and 3D consistency, which are crucial to achieving reliable matching for MVS. Specifically, to tackle the problem of the limited receptive field of existing CNN-based MVS methods, a global-context Transformer module is first proposed to explore intra-view global context. In addition, to further enable dense features to be 3D-consistent, a 3Dgeometry Transformer module is built with a well-designed cross-view attention mechanism to facilitate inter-view information interaction. Experimental results show that the proposed MVSTR achieves the best overall performance on the DTU dataset and strong generalization on the Tanks & Temples benchmark dataset.

show abstract

MVSCRF: Learning Multi-View Stereo With Conditional Random Fields

Cited by 91 publications

References 16 publications

Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo

Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo

Visibility-Aware Point-Based Multi-View Stereo Network

Multi-View Stereo with Transformer

Contact Info

Product

Resources

About