2021
DOI: 10.48550/arxiv.2103.17239
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Going deeper with Image Transformers

Abstract: Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this work, we build and optimize deeper transformer networks for image classification. In particular, we investigate the interplay of architecture and optimization of such dedicated transformers. We make two transformers architecture changes that significantly improve … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
48
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 38 publications
(49 citation statements)
references
References 52 publications
(107 reference statements)
1
48
0
Order By: Relevance
“…Under Isotrpic designs, DeiT-B [28] suffers from high memory requirement although being only a 12-layer network. Our performance match with the CaiT-S24 [29] answers our research question by successfully encapsulating the information into the lower number of tokens (Super tokens) without degrading performance.…”
Section: Results On Imagenet-1ksupporting
confidence: 57%
See 4 more Smart Citations
“…Under Isotrpic designs, DeiT-B [28] suffers from high memory requirement although being only a 12-layer network. Our performance match with the CaiT-S24 [29] answers our research question by successfully encapsulating the information into the lower number of tokens (Super tokens) without degrading performance.…”
Section: Results On Imagenet-1ksupporting
confidence: 57%
“…Both the input and output of the transformer have the same dimensions x f , y ∈ R n×D . The earlier works ViT [11], Deit [28] and CaiT [29] followed this design. The Deit [16] introduced distillation from the CNN teacher network while the CaiT [29] proposed a layer scaling operation to aid with increasing depth of the transformers.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations