Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study

Sriwastawa, Asmi; Arul Jothi, J. Angel

doi:10.1007/s11042-023-16954-x

Cited by 13 publications

(4 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The experiment conducted by He et al [28], which was the only one to utilize the same dataset as us, achieved an accuracy rate of 0.79. The experimental results of Wang et al [22], who used an ensemble of CT + ViT + ATS, show that the original ViT model does not present superior performance compared to its CNN competitors, and also Sriwastawa and Arul Jothi [29], even with the use of different single ViTs, explain that none of the models reveal a significantly improved performance compared to existing works. Until now, only a limited number of studies have explored the use of ViTs in the field of breast cancer histology image for classification [21,40].…”

Section: Background and Related Workmentioning

confidence: 98%

“…Sriwastawa and Arul Jothi [29] presented a wide comparison between performances of several newer models of the ViT, in particular the Pooling-based Vision Transformer (PiT) [30], Convolutional Vision Transformer (CvT) [31], CrossFormer [32], CrossViT [33], NesT [34], MaxViT [35], and Separable Vision Transformer (SepViT) [36], with the aim to show the enhancement of the accuracy and generalization ability of ViT. They employed the BreakHis and IDC datasets [37,38].…”

Section: Background and Related Workmentioning

confidence: 99%

“…Regarding the ViT models, not only the forming of an ensemble of networks with different preprocessing modalities is experimented on, as shown by Wang et al [22] and Tummala et al [24], but also only one type of network is investigated, as shown by He et al [28] and Sriwastawa and Arul Jothi [29]. In those cases, they utilize the BreakHis dataset for two-and eight-class classifications, which are not directly comparable with the BACH dataset we employ containing four classes.…”

Section: Background and Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Optimizing Vision Transformers for Histopathology: Pretraining and Normalization in Breast Cancer Classification

Baroni,

Rasotto,

Roitero

et al. 2024

J. Imaging

View full text Add to dashboard Cite

This paper introduces a self-attention Vision Transformer model specifically developed for classifying breast cancer in histology images. We examine various training strategies and configurations, including pretraining, dimension resizing, data augmentation and color normalization strategies, patch overlap, and patch size configurations, in order to evaluate their impact on the effectiveness of the histology image classification. Additionally, we provide evidence for the increase in effectiveness gathered through geometric and color data augmentation techniques. We primarily utilize the BACH dataset to train and validate our methods and models, but we also test them on two additional datasets, BRACS and AIDPATH, to verify their generalization capabilities. Our model, developed from a transformer pretrained on ImageNet, achieves an accuracy rate of 0.91 on the BACH dataset, 0.74 on the BRACS dataset, and 0.92 on the AIDPATH dataset. Using a model based on the prostate small and prostate medium HistoEncoder models, we achieve accuracy rates of 0.89 and 0.86, respectively. Our results suggest that pretraining on large-scale general datasets like ImageNet is advantageous. We also show the potential benefits of using domain-specific pretraining datasets, such as extensive histopathological image collections as in HistoEncoder, though not yet with clear advantages.

show abstract

Section: Background and Related Workmentioning

confidence: 98%

Section: Background and Related Workmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Optimizing Vision Transformers for Histopathology: Pretraining and Normalization in Breast Cancer Classification

Baroni,

Rasotto,

Roitero

et al. 2024

J. Imaging

View full text Add to dashboard Cite

show abstract

“…The classification token was exclusively incorporated in the final step of the CvT. The image undergoes categorization using the utilization of the MLP head on the classification tokens, aligning with the categorization procedure employed by ViT 42.…”

mentioning

confidence: 99%

Enhanced classification of microplastic polymers (polyethylene, polystyrene, low‐density polyethylene, polyhydroxyalkanoate) in waterbodies

Thavasimuthu,

Vidhya,

Sridhar

et al. 2024

Polymers for Advanced Techs

View full text Add to dashboard Cite

The contamination of microplastics (MPs) creates a substantial risk to both the environment and human health, necessitating the development of efficient methods for detecting and categorizing these micro pollutant particles. As a solution, Dense‐UNet with Convolutional Vision Transformer (Dense‐UNet‐CvT), a novel deep learning (DL)‐based model is proposed to detect and classify the MPs by performing the computer vision tasks. The main objective of this work is to enhance the detection accuracy in detecting the MPs classified from the input images. Initially, a holographic MPs image dataset comprising primary classes such as polyethylene (PE), polystyrene (PS), low‐density polyethylene (LDPE), polyhydroxyalkanoate (PHA) is collected for training and evaluating the research model. The images from the dataset are preprocessed by performing image resizing, Recursive Exposure based Sub‐Image Histogram Equalization (RESIHE)‐based image enhancement, Gaussian Adaptive Bilateral Filtering (GABF)‐based denoising to improve the visual quality of the images. The preprocessed images are applied for segmentation using the Dense‐UNet model for performing semantic segmentation. The CvT model is implemented to extract useful features and to perform classification on detecting the known and unknown classes of MPs labeled in the collected dataset. The MPs detection and classification performances are computed in terms of detection rate, accuracy, f1‐score, and precision. The Dense‐UNet‐CvT model achieved 98.22% detection rate, 98.59% accuracy, 98.35% f1‐score, and 98.76% precision. These performances are compared with the current models for proper validation, in which the research model outperformed all the compared models in terms of performance. Overall, the proposed Dense‐UNet‐CvT model demonstrates superior performance across multiple evaluation metrics, suggesting its effectiveness in detecting and classifying MPs contamination in holographic images.

show abstract

Vision transformer based convolutional neural network for breast cancer histopathological images classification

ABIMOULOUD,

BENSID,

Elleuch

et al. 2024

Multimed Tools Appl

View full text Add to dashboard Cite

Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study

Cited by 13 publications

References 35 publications

Optimizing Vision Transformers for Histopathology: Pretraining and Normalization in Breast Cancer Classification

Optimizing Vision Transformers for Histopathology: Pretraining and Normalization in Breast Cancer Classification

Enhanced classification of microplastic polymers (polyethylene, polystyrene, low‐density polyethylene, polyhydroxyalkanoate) in waterbodies

Vision transformer based convolutional neural network for breast cancer histopathological images classification

Contact Info

Product

Resources

About