Laryngeal videoendoscopy is one of the main tools in clinical examinations for voice disorders and voice research. Using high-speed videoendoscopy, it is possible to fully capture the vocal fold oscillations, however, processing the recordings typically involves a time-consuming segmentation of the glottal area by trained experts. Even though automatic methods have been proposed and the task is particularly suited for deep learning methods, there are no public datasets and benchmarks available to compare methods and to allow training of generalizing deep learning models. In an international collaboration of researchers from seven institutions from the EU and USa, we have created BaGLS, a large, multihospital dataset of 59,250 high-speed videoendoscopy frames with individually annotated segmentation masks. The frames are based on 640 recordings of healthy and disordered subjects that were recorded with varying technical equipment by numerous clinicians. the BaGLS dataset will allow an objective comparison of glottis segmentation methods and will enable interested researchers to train their own models and compare their methods.
Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533
This study presents a framework for a direct comparison of experimental vocal fold dynamics data to a numerical two-mass-model (2MM) by solving the corresponding inverse problem of which parameters lead to similar model behavior. The introduced 2MM features improvements such as a variable stiffness and a modified collision force. A set of physiologically sensible degrees of freedom is presented, and three optimization algorithms are compared on synthetic vocal fold trajectories. Finally, a total of 288 high-speed video recordings of six excised porcine larynges were optimized to validate the proposed framework. Particular focus lay on the subglottal pressure, as the experimental subglottal pressure is directly comparable to the model subglottal pressure. Fundamental frequency, amplitude and objective function values were also investigated. The employed 2MM is able to replicate the behavior of the porcine vocal folds very well. The model trajectories' fundamental frequency matches the one of the experimental trajectories in [Formula: see text] of the recordings. The relative error of the model trajectory amplitudes is on average [Formula: see text]. The experiments feature a mean subglottal pressure of 10.16 (SD [Formula: see text]) [Formula: see text]; in the model, it was on average 7.61 (SD [Formula: see text]) [Formula: see text]. A tendency of the model to underestimate the subglottal pressure is found, but the model is capable of inferring trends in the subglottal pressure. The average absolute error between the subglottal pressure in the model and the experiment is 2.90 (SD [Formula: see text]) [Formula: see text] or [Formula: see text]. A detailed analysis of the factors affecting the accuracy in matching the subglottal pressure is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.