2021 International Conference on 3D Vision (3DV) 2021
DOI: 10.1109/3dv53792.2021.00042
|View full text |Cite
|
Sign up to set email alerts
|

VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 43 publications
(15 citation statements)
references
References 39 publications
0
15
0
Order By: Relevance
“…Then NeuralRecon (Sun et al, 2021) improves the efficiency by doing this within a local window and then fusing the prediction together using a GRU module. Recently, to improve the accuracy of the reconstruction, some methods also introduce transformers to do the fusion of 2D features from different views (Bozic et al, 2021;Stier et al, 2021). However, their transformers are all limited in 2D space and used to process 2D features, which is not straightforward in the 3D reconstruction task.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Then NeuralRecon (Sun et al, 2021) improves the efficiency by doing this within a local window and then fusing the prediction together using a GRU module. Recently, to improve the accuracy of the reconstruction, some methods also introduce transformers to do the fusion of 2D features from different views (Bozic et al, 2021;Stier et al, 2021). However, their transformers are all limited in 2D space and used to process 2D features, which is not straightforward in the 3D reconstruction task.…”
Section: Related Workmentioning
confidence: 99%
“…Limited by the high resource-consuming of the multi-head attention, most of the previous works related to 3D transformers are only carefully performed on resource-saving feature processing, e.g., the one-off straightforward feature mapping without any downsampling or upsampling (Wang et al, 2021a), where the size of feature volumes remains unchanged, or the top-down tasks with only downsampling (Mao et al, 2021), where the size of feature volumes is reduced gradually. In 3D reconstruction, however, a top-down-bottom-up structure is more reasonable for feature extraction and prediction generation, as in most of the 3D-CNN-based structures (Murez et al, 2020;Sun et al, 2021;Stier et al, 2021). So in this work, we design the first 3D transformer based top-down-bottomup structure, as is shown in Figure 3.…”
Section: Sdf 3d Transformermentioning
confidence: 99%
See 2 more Smart Citations
“…NeuralRecon [31] reconstructs surfaces sequentially from video fragments as TSDF volumes by performing feature fusion from previous fragments via recurrent units. Atlas [14] aggregates image features over an entire sequence to predict a globally consistent TSDF volume and semantic labels with a 3DNN, while others [2,28] rely on transformers for feature fusion. While the above methods require direct SDF supervision at training time, inference only requires posed RGB video.…”
Section: Related Workmentioning
confidence: 99%