When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

Wang, Guangting; Zhao, Yucheng; Tang, Chengxiang; Luo, Cheng; Zeng, Wenjun

doi:10.48550/arxiv.2201.10801

Cited by 1 publication

(1 citation statement)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This model is designed with compromising the accuracy performance of small-sized datasets. Similarly, the Shift-ViT model introduced by Wang et al [27] replaced attention with zero parameter shift operation. The model classification was performed by linear layer.…”

Section: Related Workmentioning

confidence: 99%

LCDEiT: A Linear Complexity Data-Efficient Image Transformer for MRI Brain Tumor Classification

et al. 2023

View full text Add to dashboard Cite

Current deep learning-assisted brain tumor classification models sustain inductive bias and parameter dependency problems for extracting texture-based image information. Thereby concerning these problems, the recent development of the vision transformer model has substituted the DL model for classification tasks. However, the high performance of the vision transformer model depends on a large-scale dataset as well as self-attention calculations between the number of image patches which result in a quadratic computational complexity. To address these problems, the vision transformer must be data-efficient to be well-trained with a limited amount of data, and the computational complexity must be linear with the number of image patches. Consequently, this paper presents a novel linear-complexity data-efficient image transformer called LCDEiT for training with small-size datasets by using a teacher-student strategy and linear computational complexity concerning the number of patches using an external attention mechanism. The teacher model comprised a custom gated-pooled convolutional neural network to provide knowledge to the transformer-based student model for the classification of MRI brain tumors. The average classification accuracy and F1-score for two benchmark datasets including Figshare and BraTS-21 are found 98.11% and 97.86% and 93.69% and 93.68% respectively. The results indicate that the proposed model could have a great impact on medical imaging-based diagnosis where data availability and faster computations are the main concern.

show abstract

Section: Related Workmentioning

confidence: 99%