2023
DOI: 10.1007/s11042-023-16954-x
|View full text |Cite
|
Sign up to set email alerts
|

Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study

Asmi Sriwastawa,
J. Angel Arul Jothi
Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(4 citation statements)
references
References 35 publications
0
3
0
Order By: Relevance
“…The experiment conducted by He et al [28], which was the only one to utilize the same dataset as us, achieved an accuracy rate of 0.79. The experimental results of Wang et al [22], who used an ensemble of CT + ViT + ATS, show that the original ViT model does not present superior performance compared to its CNN competitors, and also Sriwastawa and Arul Jothi [29], even with the use of different single ViTs, explain that none of the models reveal a significantly improved performance compared to existing works. Until now, only a limited number of studies have explored the use of ViTs in the field of breast cancer histology image for classification [21,40].…”
Section: Background and Related Workmentioning
confidence: 98%
See 2 more Smart Citations
“…The experiment conducted by He et al [28], which was the only one to utilize the same dataset as us, achieved an accuracy rate of 0.79. The experimental results of Wang et al [22], who used an ensemble of CT + ViT + ATS, show that the original ViT model does not present superior performance compared to its CNN competitors, and also Sriwastawa and Arul Jothi [29], even with the use of different single ViTs, explain that none of the models reveal a significantly improved performance compared to existing works. Until now, only a limited number of studies have explored the use of ViTs in the field of breast cancer histology image for classification [21,40].…”
Section: Background and Related Workmentioning
confidence: 98%
“…Sriwastawa and Arul Jothi [29] presented a wide comparison between performances of several newer models of the ViT, in particular the Pooling-based Vision Transformer (PiT) [30], Convolutional Vision Transformer (CvT) [31], CrossFormer [32], CrossViT [33], NesT [34], MaxViT [35], and Separable Vision Transformer (SepViT) [36], with the aim to show the enhancement of the accuracy and generalization ability of ViT. They employed the BreakHis and IDC datasets [37,38].…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The classification token was exclusively incorporated in the final step of the CvT. The image undergoes categorization using the utilization of the MLP head on the classification tokens, aligning with the categorization procedure employed by ViT 42.…”
mentioning
confidence: 99%