2023
DOI: 10.1007/978-1-0716-3195-9_6
|View full text |Cite
|
Sign up to set email alerts
|

Transformers and Visual Transformers

Abstract: Transformers were initially introduced for natural language processing (NLP) tasks, but fast they were adopted by most deep learning fields, including computer vision. They measure the relationships between pairs of input tokens (words in the case of text strings, parts of images for visual transformers), termed attention. The cost is exponential with the number of tokens. For image classification, the most common transformer architecture uses only the transformer encoder in order to transform the various inpu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 32 publications
0
1
0
Order By: Relevance
“…Since its introduction, it has revolutionized Deep Learning, making breakthroughs in a variety of fields such as Natural Language Processing, Computer Vision, Chemistry, and Biology on its way to becoming the default architecture for learning representations. The standard Transformer [4] has recently been adapted for vision tasks [5]. Once again, visual Transformer has emerged as a key architecture in computer vision.…”
Section: Visual Transformersmentioning
confidence: 99%
“…Since its introduction, it has revolutionized Deep Learning, making breakthroughs in a variety of fields such as Natural Language Processing, Computer Vision, Chemistry, and Biology on its way to becoming the default architecture for learning representations. The standard Transformer [4] has recently been adapted for vision tasks [5]. Once again, visual Transformer has emerged as a key architecture in computer vision.…”
Section: Visual Transformersmentioning
confidence: 99%