2023
DOI: 10.3390/drones7050287
|View full text |Cite
|
Sign up to set email alerts
|

A Comprehensive Survey of Transformers for Computer Vision

Abstract: As a special type of transformer, vision transformers (ViTs) can be used for various computer vision (CV) applications. Convolutional neural networks (CNNs) have several potential problems that can be resolved with ViTs. For image coding tasks such as compression, super-resolution, segmentation, and denoising, different variants of ViTs are used. In our survey, we determined the many CV applications to which ViTs are applicable. CV applications reviewed included image classification, object detection, image se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 27 publications
(6 citation statements)
references
References 158 publications
0
6
0
Order By: Relevance
“…In recent years, Transformer-based approaches have achieved remarkable results in the fields of language processing, computer vision and also forecasting [15,[24][25][26].…”
Section: Lstmmentioning
confidence: 99%
“…In recent years, Transformer-based approaches have achieved remarkable results in the fields of language processing, computer vision and also forecasting [15,[24][25][26].…”
Section: Lstmmentioning
confidence: 99%
“…RDUNet [ 28 ] is a residual dense neural network for image denoising based on a densely connected hierarchical network. Recently, transformer technology has been applied to image denoising [ 29 , 30 ]. Most representatively, swin-transformer UNet for image denoising (SUNet) [ 31 ] and swin-transformer-based image restoration (SwinIR) [ 32 ] adopt the swin-transformer as the primary module and integrate it into a unique denoising architecture to suppress additive noise.…”
Section: Related Workmentioning
confidence: 99%
“…The results show that the main applications of ViTs are as follows: 50% are for image classification, 40% are for object detection, 1% are for segmentation, 1% are for compression, 2% are for super-resolution, 3% are for denoising, and 3% are for anomaly detection [22].…”
Section: Transformer Models In Computer Visionmentioning
confidence: 99%