2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022
DOI: 10.1109/cvprw56347.2022.00036
|View full text |Cite
|
Sign up to set email alerts
|

TMVNet : Using Transformers for Multi-view Voxel-based 3D Reconstruction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(17 citation statements)
references
References 23 publications
0
17
0
Order By: Relevance
“…Based on the network, they proposed an RNN-based model to gain the 3D corresponding representation from the input image. TMVNet [ 18 ] applied the transformers to the encoder and proposed a 3D feature fusion layer to refine the predictions. Kniaz et al [ 19 ] proposed an image-to-voxel translation model which applied a generative adversarial network.…”
Section: Related Workmentioning
confidence: 99%
“…Based on the network, they proposed an RNN-based model to gain the 3D corresponding representation from the input image. TMVNet [ 18 ] applied the transformers to the encoder and proposed a 3D feature fusion layer to refine the predictions. Kniaz et al [ 19 ] proposed an image-to-voxel translation model which applied a generative adversarial network.…”
Section: Related Workmentioning
confidence: 99%
“…The reconstruction is decoded from the weighted sum of latent codes. Transformer models incorporating self-attention have also been proposed for 3D reconstruction [10][11][12]. None of the attention-based methods supports iterative updating of a previous reconstruction, since these architectures expect to receive all input images at once.…”
Section: Related Workmentioning
confidence: 99%
“…AttSets [7] (2020) 0.685 Pix2Vox++/F [9] (2020) 0.696 Pix2Vox++/A [9] (2020) 0.715 EVolT [10] (2021) 0.698 TMVNet [12] (2022) 0.719 3D-R2N2 [6] (2016) 0.635 Ours 0.690…”
Section: Iou Iterativementioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, it lacks stochastic learning capability in the mapping between the extracted image features and the reconstructed 3D models. Peng, K. et al [24] used a transformer-based encoder-decoder called TMVNet, which outperforms previous methods for 3D reconstruction. This method uses 2D CNN encoders to extract multiple-viewpoint image features and passes the extracted features to two transformer encoders to generate 3D feature vectors.…”
Section: Related Workmentioning
confidence: 99%