2022
DOI: 10.1142/s1793545822500092
|View full text |Cite
|
Sign up to set email alerts
|

Computer-aided diagnosis of retinopathy based on vision transformer

Abstract: Age-related Macular Degeneration (AMD) and Diabetic Macular Edema (DME) are two common retinal diseases for elder people that may ultimately cause irreversible blindness. Timely and accurate diagnosis is essential for the treatment of these diseases. In recent years, computer-aided diagnosis (CAD) has been deeply investigated and effectively used for rapid and early diagnosis. In this paper, we proposed a method of CAD using vision transformer to analyze optical coherence tomography (OCT) images and to automat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
2

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(15 citation statements)
references
References 25 publications
0
11
2
Order By: Relevance
“…C.1 —In adapting Swin-UNETR to the domain of automatic retinal OCT lesion segmentation, this work contributed to a recent research strand that focuses on the use of ViTs for the automated segmentation of retinal lesions in OCT images. In particular, in our study, we used a hybrid Transformer-CNN that, unlike the pure transformer used in recent contributions 25 , 26 , is less training data demanding and thus does not require pretraining. In this work, the adaptation of the Swin-UNETR is limited to experimenting with different feature sizes.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…C.1 —In adapting Swin-UNETR to the domain of automatic retinal OCT lesion segmentation, this work contributed to a recent research strand that focuses on the use of ViTs for the automated segmentation of retinal lesions in OCT images. In particular, in our study, we used a hybrid Transformer-CNN that, unlike the pure transformer used in recent contributions 25 , 26 , is less training data demanding and thus does not require pretraining. In this work, the adaptation of the Swin-UNETR is limited to experimenting with different feature sizes.…”
Section: Discussionmentioning
confidence: 99%
“…(2) A CNN-based feature extractor produces a semantic gap that is subsequently bridged by adaptively fusing multi-scale features, while our study assumes a weaker semantic gap thanks to a transformer-based feature extractor. Other relevant studies are the ones of Kihara 25 and Jiang 26 , where transformers have been considered for analyzing OCT images. Kihara and coauthors focused on nonexudative macular neovascularization and proposed a transformer-based encoder-decoder architecture for the segmentation task.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…After feature extraction, the Classtoken is fully concatenated for classification using MHSA ( Figure 4 ). In this paper, the ViT-B/16 model, a derivative of ViT, is chosen, consisting of a stack of 12 blocks, each containing 16 attentional mechanisms [ 38 , 39 ].…”
Section: Methodsmentioning
confidence: 99%
“…CNN models use convolutional kernels to extract features from a fixed range of data, which can also be referred to as local feature extraction. Vision Transformer (ViT) models, using the encoder from a Transformer Encoder base model, have achieved good results in the field of image classification [ 47 , 48 , 49 ]. Feature extraction with ViT differs from CNN in that there is more similarity between the representations obtained at shallow and deep layers.…”
Section: Introductionmentioning
confidence: 99%