2023
DOI: 10.1002/cav.2201
|View full text |Cite
|
Sign up to set email alerts
|

TMSDNet: Transformer with multi‐scale dense network for single and multi‐view 3D reconstruction

Xiaoqiang Zhu,
Xinsheng Yao,
Junjie Zhang
et al.

Abstract: Abstract3D reconstruction is a long‐standing problem. Recently, a number of studies have emerged that utilize transformers for 3D reconstruction, and these approaches have demonstrated strong performance. However, transformer‐based 3D reconstruction methods tend to establish the transformation relationship between the 2D image and the 3D voxel space directly using transformers or rely solely on the powerful feature extraction capabilities of transformers. They ignore the crucial role played by deep multi‐scale… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 52 publications
0
1
0
Order By: Relevance
“…Iashin et al 20 proposed a dual-modal Transformer structure designed to process both audio and video inputs, fostering mutual learning between the two modalities. Zhu et al 21 proposed a new Transformer framework (TMSDNet) for single-view and multi-view 3D reconstruction for 3D reconstruction problems. Zhu et al 22 enhanced BERT and introduced ActBERT for self-supervised learning of joint video-text representation from unlabeled videos.…”
Section: Transformer-related Approachesmentioning
confidence: 99%
“…Iashin et al 20 proposed a dual-modal Transformer structure designed to process both audio and video inputs, fostering mutual learning between the two modalities. Zhu et al 21 proposed a new Transformer framework (TMSDNet) for single-view and multi-view 3D reconstruction for 3D reconstruction problems. Zhu et al 22 enhanced BERT and introduced ActBERT for self-supervised learning of joint video-text representation from unlabeled videos.…”
Section: Transformer-related Approachesmentioning
confidence: 99%