Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review

Mauricio, Jose; Domingues, Inês; Bernardino, Jorge

doi:10.3390/app13095521

Cited by 143 publications

(32 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The deep learning feature extraction is accomplished using a vision transformer based model (ViT-Base) [14]. Different from conventional neural networks, vision transformer utilizes the self-attention mechanism to focus on the most important regions of the target image, based on which the most meaningful features were computed for certain classification or prediction tasks [15]. Since our limited dataset cannot support sufficient fine-tuning or optimization of the ViT model, we directly used the pre-trained ViT model as a fixed feature extractor.…”

Section: Methodsmentioning

confidence: 99%

Developing a CT-image-based clinical marker to predict vasospasm for aneurysmal subarachnoid hemorrhage patients

Sadri,

Abdoli,

Zhang

et al. 2024

Biophotonics and Immune Responses XIX

View full text Add to dashboard Cite

Vasospasm is a common complication in aneurysmal subarachnoid hemorrhage (aSAH). Currently, patients with aSAH are usually monitored at intensive care unit (ICU) for approximately 14 days for early detection and treatment of vasospasm. To facilitate the diagnosis and decision-making process, this investigation aims to combine radiomics and deep learning technologies to predict vasospasm that requires intra-arterial treatment for patients with aSAH. For this purpose, a retrospective dataset was collected, containing a total of 52 aSAH patients. Next, a total of 1032 radiomic features and 768 vision transformer (ViT) based features were computed for each case to comprehensively quantify the aSAH characteristics. Based on the initial feature pool, analysis of variance (ANOVA) F1 score was applied to select 30 best performed features as the optimal feature cluster. Finally, a support vector machine (SVM) based classifier was trained to predict the vasospasm, and the model performance was evaluated using a 4-fold cross-validation strategy. Receiver operating characteristics (ROC) curve and confusion matrix were adopted as the assessing indices. The result show that the model achieved an area under the ROC curve (AUC) of 0.86±0.03, positive predictive value of 78%, negative predictive value of 76%, and overall accuracy of 77%, respectively. This investigation initially verified the feasibility of using CT images to accurately predict cerebral vasospasm.

show abstract

Section: Methodsmentioning

confidence: 99%

Developing a CT-image-based clinical marker to predict vasospasm for aneurysmal subarachnoid hemorrhage patients

Sadri,

Abdoli,

Zhang

et al. 2024

Biophotonics and Immune Responses XIX

View full text Add to dashboard Cite

show abstract

“…On the other hand, vision transforms do not contain inductive biases. Also, the combination of CNNs and Transformers was applied to image processing [186], which contributed to reducing the consumption of computing resources and training time [187,188]. The main disadvantages of Transformers are the need for commitment of large amounts of computational resources and the requirements of the long training time.…”

Section: Neural Network and Learning Algorithms In The Medical Image ...mentioning

confidence: 99%

Artificial Intelligence-Based Algorithms in Medical Image Scan Segmentation and Intelligent Visual-Content Generation

Rudnicka,

Szczepanski,

Pregowska

2023

Preprint

View full text Add to dashboard Cite

Recently, Artificial Intelligence (AI)-based algorithms have revolutionized the medical image segmentation processes. Thus, the precise segmentation of organs and their lesions may contribute to an efficient diagnostics process and a more effective selection of targeted therapies as well as increasing the effectiveness of the training process. Thus, AI may contribute to the automatization of the image scan segmentation process and increase the quality of the resulting 3D objects, which may lead to the generation of more realistic virtual objects. In this paper, we focus on the AI-based solutions applied in the medical image scan segmentation, and intelligent visual-content generation, i.e. computer-generated three-dimensional (3D) images in the context of Extended Reality (XR). We consider different types of neural networks used with a special emphasis on the learning rules applied, taking into account algorithm accuracy and performance, as well as open data availability. This paper also attempts to summarize the current development of AI-based segmentation methods in medical imaging and intelligent visual content generation that are applied in XR. Finally, this paper concludes with possible developments and open challenges in AI application in Extended Reality-based solutions. Finally, the future lines of research and development directions of Artificial Intelligence applications both in medical image segmentation and Extended reality-based medical solutions are discussed.

show abstract

“…The Convolution Vision Transformer structure merges the advantages of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Convolutional Neural Networks (CNNs) are recognised for their efficiency in processing local features through their convolutional layers, while Vision Transformers (ViTs) excel at capturing global dependencies in an image through self-attention mechanisms (Maurí cio et al, 2023). The PixelShuffle operation, also known as sub-pixel convolution, is a technique mainly used for upscaling images in super-resolution tasks (Wang et al, 2023b).…”

Section: Introductionmentioning

confidence: 99%

NC2C-TransCycleGAN: Non-Contrast to Contrast-Enhanced CT Image Synthesis Using Transformer CycleGAN

Hou,

Liu,

Zhang

et al. 2024

View full text Add to dashboard Cite

Background: Lung cancer poses a great threat to human life and health. Although the density differences between lesions and normal tissues shown on enhanced CT images is very helpful for doctors to characterize and detect lesions, contrast agents and radiation may cause harm to the health of patients with lung cancer. By learning the mapping relationship between plain CT image and enhanced CT image through deep learning methods, high quality synthetic CECT image results can be generated based on plain scan CT image. It has great potential to help save treatment time and cost of lung cancer patients, reduce radiation dose and expand the medical image dataset in the field of deep learning. Methods: In this study, plain and enhanced CT images of 71 lung cancer patients were retrospectively collected. The data from 58 lung cancer patients were randomly assigned to the training set, and the other 13 cases formed the test set. The Convolution Vison Transformer structure and PixelShuffle operation were combined with CycleGAN, respectively, to help generate clearer images. After random erasing, image scaling and flipping to enhance the training data, paired plain and enhanced CT slices of each patient are input into the network as input and labeled, respectively, for model training.Finally, the peak signal-to-noise ratio, structural similarity and mean square error are used to evaluate the image quality and similarity. Results: The performance of our proposed method is compared with CycleGAN and Pix2Pix on the test set, respectively. The results show that the SSIM value of the enhanced CT images generated by the proposed method improve by 2.00% and 1.39%, the PSNR values improve by 2.05% and 1.71%, and the MSE decreases by 12.50% and 8.53%, respectively, compared with Pix2Pix and CycleGAN. Conclusions: The experimental results show that the improved algorithm based on CylceGAN proposed in this paper can synthesize high-quality lung cancer synthetic enhanced CT images, which is helpful to expand the lung cancer image data set in the deep learning research. More importantly, this method can help lung cancer patients save medical treatment time and cost.

show abstract

Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review

Cited by 143 publications

References 29 publications

Developing a CT-image-based clinical marker to predict vasospasm for aneurysmal subarachnoid hemorrhage patients

Developing a CT-image-based clinical marker to predict vasospasm for aneurysmal subarachnoid hemorrhage patients

Artificial Intelligence-Based Algorithms in Medical Image Scan Segmentation and Intelligent Visual-Content Generation

NC2C-TransCycleGAN: Non-Contrast to Contrast-Enhanced CT Image Synthesis Using Transformer CycleGAN

Contact Info

Product

Resources

About