2019
DOI: 10.1007/s11263-019-01217-w
|View full text |Cite
|
Sign up to set email alerts
|

Robust Attentional Aggregation of Deep Feature Sets for Multi-view 3D Reconstruction

Abstract: We study the problem of recovering an underlying 3D shape from a set of images. Existing learning based approaches usually resort to recurrent neural nets, e.g., GRU, or intuitive pooling operations, e.g., max/mean pooling, to fuse multiple deep features encoded from input images. However, GRU based approaches are unable to consistently estimate 3D shapes given the same set of input images as the recurrent unit is permutation variant. It is also unlikely to refine the 3D shape given more images due to the long… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
75
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 118 publications
(75 citation statements)
references
References 40 publications
0
75
0
Order By: Relevance
“…By contrast, we turn to the powerful attention mechanism to automatically learn important local features. In particular, inspired by [69], our attentive pooling unit consists of the following steps. Computing Attention Scores.…”
Section: Local Feature Aggregationmentioning
confidence: 99%
“…By contrast, we turn to the powerful attention mechanism to automatically learn important local features. In particular, inspired by [69], our attentive pooling unit consists of the following steps. Computing Attention Scores.…”
Section: Local Feature Aggregationmentioning
confidence: 99%
“…Ilse et al (2018) proposed to use attention-based weighted sum-pooling for multiple instance learning. Similarly, Yang et al (2020) proposed an attentionbased algorithm to aggregate a deep feature set for multi-view 3D reconstruction.…”
Section: Related Workmentioning
confidence: 99%
“…Particularly the ModelNet40 dataset (Wu et al, 2015) has been used to test four of the mentioned models. Whilst AttSets (Yang et al, 2020) employs it to formulate a multi-view reconstruction task, the other three methods are tested on the core classification task with PointNet reaching an accuracy of 0.892 (Qi et al, 2017), DeepSets 0.900 (Zaheer et al, 2017) and the Set Transformer 0.904 (Lee et al, 2019). However, the specific methods used to produce the point clouds from the provided mesh representation of objects showcased certain differences, further highlighting the need for a systematic, uniform comparison.…”
Section: Notes On Dataset Performancementioning
confidence: 99%