2022
DOI: 10.1109/access.2022.3212767
|View full text |Cite
|
Sign up to set email alerts
|

Neural Architecture Search for Transformers: A Survey

Abstract: Transformer-based Deep Neural Network architectures have gained tremendous interest due to their effectiveness in various applications across Natural Language Processing (NLP) and Computer Vision (CV) domains. These models are the de facto choice in several language tasks, such as Sentiment Analysis and Text Summarization, replacing Long Short Term Memory (LSTM) model. Vision Transformers (ViTs) have shown better model performance than traditional Convolutional Neural Networks (CNNs) in vision applications whi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(15 citation statements)
references
References 217 publications
0
15
0
Order By: Relevance
“…In contrast, NAS has not yet been fully explored for ViTs. In [160], the authors surveyed several NAS techniques for ViTs. To the best of our knowledge, there are limited studies on the NAS exploration in ViTs [161][162][163][164][165][166], and more attention is needed in the future.…”
Section: Neural Architecture Search (Nas)mentioning
confidence: 99%
“…In contrast, NAS has not yet been fully explored for ViTs. In [160], the authors surveyed several NAS techniques for ViTs. To the best of our knowledge, there are limited studies on the NAS exploration in ViTs [161][162][163][164][165][166], and more attention is needed in the future.…”
Section: Neural Architecture Search (Nas)mentioning
confidence: 99%
“…The cross entropy between 𝑝(𝑥) and 𝑞(𝑥; 𝜃) is given by 𝐻(𝑝, 𝑞) ≡ −𝔼 𝑝 [log 𝑞(𝑥; 𝜃)], (16) where the second equality holds if the number of samples (𝑚) is large enough. Equation (16) shows that maximizing the likelihood in (15) with respect to the parameter 𝜃 is equivalent to minimizing the cross entropy of (13).…”
Section: Appendixmentioning
confidence: 99%
“…Thus, they can be regarded as a special kind of regularization [12] to improve overfitting due to small datasets. At present, Transformer has become the mainstream architecture of PTMs for NLP tasks [13]. The well-known pre-trained language models BERT, GPT-2 and GPT-3 [14]- [15] are extensions of the Transformer architecture.…”
Section: Introductionmentioning
confidence: 99%
“…Utilizing this self‐built prototype system, we successfully obtained hyperspectral images of four distinct bacterial pathogens and analyzed their spectral differences. Additionally, inspired by the great achievements of the Transformer network in the field of natural language processing (NLP) and image processing, such as ChatGPT, ViT, and so on [23]. This article is dedicated to extending the network architecture to identify hyperspectral images of infectious pathogens.…”
Section: Introductionmentioning
confidence: 99%