Support of deep learning to classify vocal fold images in flexible laryngoscopy

Tran, Bich Anh; Dao, Thao Thi Phuong; Dung, Ho Dang Quy; Van, Ngoc Boi; Ha, Chanh Cong; Pham, Nam Hoang; Nguyen, Tu Cong Huyen Ton Nu Cam; Nguyen, Tan-Cong; Minh-Khoi, Pham,; Tran, Mai-Khiem; Tran, Truong Minh; Tran, Minh–Triet

doi:10.1016/j.amjoto.2023.103800

Cited by 7 publications

(5 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Of these, 142 did not meet our eligibility criteria, leading to 34 total studies being included in our review. This included 18 studies utilizing patient voice, 10,24‐40 15 studies using images from laryngoscopy, 12,41‐54 and 1 study using both as input for their deep learning models 55 . The study selection process is illustrated in the PRISMA flowchart (Figure 1).…”

Section: Resultsmentioning

confidence: 99%

“…Finally, deep learning also outperformed general practitioner and expert otolaryngologist clinical examination in 6 of 7 studies that compared the 2 12,29,31,46‐48,51 . In addition to potential improved accuracy, neural networks have a significant advantage over physicians in classification speed, with Zhao et al reporting a rate of fifteen seconds per image for physicians compared to 0.01 seconds per image for MobileNetV2 and He et al displaying rates of 5.5 and 0.01 seconds per image for physicians and InceptionV3, respectively 46,54 .…”

Section: Discussionmentioning

confidence: 99%

“…Finally, deep learning also outperformed general practitioner and expert otolaryngologist clinical examination in 6 of 7 studies that compared the 2. 12,29,31,[46][47][48]51 In addition to potential improved accuracy, neural networks have a significant advantage over physicians in classification speed, with Zhao et al reporting a rate of fifteen seconds per image for physicians compared to 0.01 seconds per image for MobileNetV2 and He et al displaying rates of 5.5 and 0.01 seconds per image for physicians and InceptionV3, respectively. 46,54 These findings indicate that while these studies have generalizability limitations discussed below, deep learning algorithms have the potential to serve as tools for general practitioner screening and augmentation of expert laryngologist decision-making.…”

Section: Discussionmentioning

confidence: 99%

“…46,51 A fifth study, Tran et al, compared 5 CNNs with an otolaryngology resident and attending, with 3 outperforming and 1 scoring equally with the resident, but all 5 displaying lower accuracy than the attending (Table 5). 48…”

Section: Acoustic Groupmentioning

confidence: 99%

See 3 more Smart Citations

The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review

Barlow,

Sragi,

Rivera‐Rivera

et al. 2024

Otolaryngol.--head neck surg.

View full text Add to dashboard Cite

ObjectiveTo summarize the use of deep learning in the detection of voice disorders using acoustic and laryngoscopic input, compare specific neural networks in terms of accuracy, and assess their effectiveness compared to expert clinical visual examination.Data SourcesEmbase, MEDLINE, and Cochrane Central.Review MethodsDatabases were screened through November 11, 2023 for relevant studies. The inclusion criteria required studies to utilize a specified deep learning method, use laryngoscopy or acoustic input, and measure accuracy of binary classification between healthy patients and those with voice disorders.ResultsThirty‐four studies met the inclusion criteria, with 18 focusing on voice analysis, 15 on imaging analysis, and 1 both. Across the 18 acoustic studies, 21 programs were used for identification of organic and functional voice disorders. These technologies included 10 convolutional neural networks (CNNs), 6 multilayer perceptrons (MLPs), and 5 other neural networks. The binary classification systems yielded a mean accuracy of 89.0% overall, including 93.7% for MLP programs and 84.5% for CNNs. Among the 15 imaging analysis studies, a total of 23 programs were utilized, resulting in a mean accuracy of 91.3%. Specifically, the twenty CNNs achieved a mean accuracy of 92.6% compared to 83.0% for the 3 MLPs.ConclusionDeep learning models were shown to be highly accurate in the detection of voice pathology, with CNNs most effective for assessing laryngoscopy images and MLPs most effective for assessing acoustic input. While deep learning methods outperformed expert clinical exam in limited comparisons, further studies integrating external validation are necessary.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Acoustic Groupmentioning

confidence: 99%

See 2 more Smart Citations

The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review

Barlow,

Sragi,

Rivera‐Rivera

et al. 2024

Otolaryngol.--head neck surg.

View full text Add to dashboard Cite

show abstract

“…A novel Deep-Learning-Based Mask R-CNN Model was presented, which identified Laryngeal Cancer from CT images [14]. The Xception model was used to classify three classes: normal vocal folds, abnormal, and no finding from laryngoscopy images [15]. An early glottic cancer detection model was proposed, employing ensemble learning of Convolutional Neural Network classifiers based on voice and laryngeal imaging [16].…”

Section: Introductionmentioning

confidence: 99%

Dual Deep Learning and Feature-Based Models for Classification of Laryngeal Squamous Cell Carcinoma Using Narrow Band Imaging

Sharmila Joseph,

Vidyarthi

2024

View full text Add to dashboard Cite

Laryngeal Squamous Cell Carcinoma (LSCC) is a prevalent form of laryngeal cancer that originates from the mucosal surface of the larynx. The visual analysis of laryngeal tissue vascular patterns poses a significant challenge, as it heavily relies on the expertise and experience of medical practitioners. This paper proposes a dual approach for the early diagnosis of LSCC by employing a lightweight Deep Convolutional Neural Network (CNN) and statistical features. It further delves into feature visualization and interpretation of the proposed classification models. Methods: The initial step involves enhancing image quality through Contrast Limited Adaptive Histogram Equalization (CLAHE). In the first approach, we employ a modified SqueezeNet for classifying laryngeal tissues. In the second approach, we extract a combination of first-order statistical features -Percentile-25, Percentile-50, Percentile-75, Mean, and Standard Deviation of each RGB channeland second-order statistical features such as Contrast, Energy, Homogeneity, and Correlation from the Gray-Level Co-Occurrence Matrix (GLCM). These features are then classified using the Extreme Gradient Boosting (XGBoost) classification model. Results: The proposed models are trained and validated using an augmented publicly available dataset, prepared for both binary and multiclass classifications. The results indicate that the proposed models demonstrate exceptional accuracy and efficiency in classifying types of laryngeal cancer.

show abstract

Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset

Dao,

Huynh,

Pham

et al. 2024

J Digit Imaging. Inform. med.

View full text Add to dashboard Cite

Support of deep learning to classify vocal fold images in flexible laryngoscopy

Cited by 7 publications

References 13 publications

The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review

The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review

Dual Deep Learning and Feature-Based Models for Classification of Laryngeal Squamous Cell Carcinoma Using Narrow Band Imaging

Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset

Contact Info

Product

Resources

About