2023
DOI: 10.48550/arxiv.2303.11126
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Robustifying Token Attention for Vision Transformers

Abstract: Despite the success of vision transformers (ViTs), they still suffer from significant drops in accuracy in the presence of common corruptions, such as noise or blur. Interestingly, we observe that the attention mechanism of ViTs tends to rely on few important tokens, a phenomenon we call token overfocusing. More critically, these tokens are not robust to corruptions, often leading to highly diverging attention patterns. In this paper, we intend to alleviate this overfocusing issue and make attention more stabl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 50 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?