Curvature-guided dynamic scale networks for Multi-view Stereo

Truong, Giang, Khang; Song, Soohwan; Jo, Sungho

doi:10.48550/arxiv.2112.05999

Cited by 5 publications

(6 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Only the training set of DT is selected as source domain to train the model, and we evaluate the performance on other source domains. We directly utilize the open-sourced pretrained model of CasMVSNet [3], PatchMatchNet [59], Iter-MVS [60], MVSTER [47], CDS-MVSNet [61], and UniMVS-Net [5], to test on target domains without finetuning. These pre-trained models are further used for evaulation on unseen datasets on DT → BL, DT → GS, and DT → PA. We further provide experimental results on the same dataset (DT → DT) to evaluate the performance.…”

Section: E Comparison With Mvs Methodsmentioning

confidence: 99%

RobustMVS: Single Domain Generalized Deep Multi-View Stereo

Xu,

Chen,

Sun

et al. 2024

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Despite the impressive performance of Multi-view Stereo (MVS) approaches given plenty of training samples, the performance degradation when generalizing to unseen domains has not been clearly explored yet. In this work, we focus on the domain generalization problem in MVS. To evaluate the generalization results, we build a novel MVS domain generalization benchmark including synthetic and real-world datasets. In contrast to conventional domain generalization benchmarks, we consider a more realistic but challenging scenario, where only one source domain is available for training. The MVS problem can be analogized back to the feature matching task, and maintaining robust feature consistency among views is an important factor for improving generalization performance. To address the domain generalization problem in MVS, we propose a novel MVS framework, namely RobustMVS 1 . A Depth-Clustering-guided Whitening (DCW) loss is further introduced to preserve the feature consistency among different views, which decorrelates multi-view features from viewpoint-specific style information based on geometric priors from depth maps. The experimental results further show that our method achieves superior performance on the domain generalization benchmark 2 .

show abstract

Section: E Comparison With Mvs Methodsmentioning

confidence: 99%

RobustMVS: Single Domain Generalized Deep Multi-View Stereo

Xu,

Chen,

Sun

et al. 2024

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…Simultaneously, high-level features extracted from CNNs exhibit a high level of semantic abstraction, making them well-suited for classification rather than fine-grained feature matching. While some efforts leverage deformable convolutions [9] and normal curvatures [36] to improve the receptive fields in a flexible manner, the extracted features still have inductive biases. Equipped with long-range attention modules, ViTs can provide global perception for MVS models better than the low-level textures, and the patch-wise feature encoding of ViTs also works well for feature matching [37].…”

Section: Feature Extractionmentioning

confidence: 99%

Feature‐enhanced representation with transformers for multi‐view stereo

Xiang,

Yin

2024

IET Image Processing

View full text Add to dashboard Cite

Most existing multi‐view stereo (MVS) methods fail to consider global context information in the stage of feature extraction and cost aggregation. As transformers have shown remarkable performance on various vision tasks due to their ability to perceive global contextual information, this paper proposes a transformer‐based feature enhancement network (TF‐MVSNet) to facilitate feature representation learning by combining local features (both 2D and 3D) with long‐range contextual information. To reduce memory consumption of feature matching, the cross‐attention mechanism is leveraged to efficiently construct 3D cost volumes under the epipolar constraint. Additionally, a colour‐guided network is designed to refine depth maps at a coarse stage, hence reducing incorrect depth predictions at a fine stage. Extensive experiments were performed on the DTU dataset and Tanks and Temples (T&T) benchmark and results are reported.

show abstract

“…Two classical methods (Altizure and OpenMVS) and several deep-learning-based methods from recent years are used for comparative analyses: EPP-MVSNet (Ma et al, 2021), PatchmatchNet (Wang et al, 2020), CDS-MVSNet (Giang et al, 2021) and MG-MVSNet (Zhang et al, 2023). In addition, the results for the original implementation of CasMVSNet (Gu et al, 2019) and TransMVSNet (Ding et al, 2021) are presented.…”

Section: Tanks and Templesmentioning

confidence: 99%

Adaptive region aggregation for multi‐view stereo matching using deformable convolutional networks

Hu,

Su,

Mao

et al. 2023

The Photogrammetric Record

View full text Add to dashboard Cite

Deep‐learning methods have demonstrated promising performance in multi‐view stereo (MVS) applications. However, it remains challenging to apply a geometrical prior on the adaptive matching windows to achieve efficient three‐dimensional reconstruction. To address this problem, this paper proposes a learnable adaptive region aggregation method based on deformable convolutional networks (DCNs), which is integrated into the feature extraction workflow for MVSNet method that uses coarse‐to‐fine structure. Following the conventional pipeline of MVSNet, a DCN is used to densely estimate and apply transformations in our feature extractor, which is composed of a deformable feature pyramid network (DFPN). Furthermore, we introduce a dedicated offset regulariser to promote the convergence of the learnable offsets of the DCN. The effectiveness of the proposed DFPN is validated through quantitative and qualitative evaluations on the BlendedMVS and Tanks and Temples benchmark datasets within a cross‐dataset evaluation setting.

show abstract

Curvature-guided dynamic scale networks for Multi-view Stereo

Cited by 5 publications

References 30 publications

RobustMVS: Single Domain Generalized Deep Multi-View Stereo

RobustMVS: Single Domain Generalized Deep Multi-View Stereo

Feature‐enhanced representation with transformers for multi‐view stereo

Adaptive region aggregation for multi‐view stereo matching using deformable convolutional networks

Contact Info

Product

Resources

About