Disordered voices are frequently assessed by speech pathologists using perceptual evaluations. This might lead to problems caused by the subjective nature of the process and due to the influence of external factors which compromise the quality of the assessment. In order to increase the reliability of the evaluations, the design of automatic evaluation systems is desirable. With that in mind, this paper presents an automatic system which assesses the Grade and Roughness level of the speech according to the GRBAS perceptual scale. Two parameterization methods are used: one based on the classic Mel-Frequency Cepstral Coefficients, which has already been used successfully in previous works, and other derived from modulation spectra. For the latter, a new group of parameters has been proposed, named Modulation Spectra Morphological Parameters: MSC, DRB, LMR, MSH, MSW, CIL, PALA, and RALA. In methodology, PCA and LDA are employed to reduce the dimensionality of feature space, and GMM classifiers to evaluate the ability of the proposed features on distinguishing the different levels. Efficiencies of 81.6% and 84.7% are obtained for Grade and Roughness, respectively, using modulation spectra parameters, while MFCCs performed 80.5% and 77.7%. The obtained results suggest the usefulness of the proposed Modulation Spectra Morphological Parameters for automatic evaluation of Grade and Roughness in the speech.
This review provides a comprehensive compilation, from a digital image processing point of view of the most important techniques currently developed to characterize and quantify the vibration behaviour of the vocal folds, along with a detailed description of the laryngeal image modalities currently used in the clinic. The review presents an overview of the most significant glottal-gap segmentation and facilitative playbacks techniques used in the literature for the mentioned purpose, and shows the drawbacks and challenges that still remain unsolved to develop robust vocal folds vibration function analysis tools based on digital image processing.
BackgroundThe image-based analysis of the vocal folds vibration plays an important role in the diagnosis of voice disorders. The analysis is based not only on the direct observation of the video sequences, but also in an objective characterization of the phonation process by means of features extracted from the recorded images. However, such analysis is based on a previous accurate identification of the glottal gap, which is the most challenging step for a further automatic assessment of the vocal folds vibration.MethodsIn this work, a complete framework to automatically segment and track the glottal area (or glottal gap) is proposed. The algorithm identifies a region of interest that is adapted along time, and combine active contours and watershed transform for the final delineation of the glottis and also an automatic procedure for synthesize different videokymograms is proposed.ResultsThanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds.ConclusionsThe novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation. Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.
The visual examination of the vibration patterns of the vocal folds is an essential method to understand the phonation process and diagnose voice disorders. However, a detailed analysis of the phonation based on this technique requires a manual or a semi-automatic segmentation of the glottal area, which is difficult and time consuming. The present work presents a cuasi-automatic framework to accurately segment the glottal area introducing several techniques not explored before in the state of the art. The method takes advantage of the possibility of a minimal user intervention for those cases where the automatic computation fails. The presented method shows a reliable delimitation of the glottal gap, achieving an average improvement of 13 and 18% with respect to two other approaches found in the literature, while reducing the error of wrong detection of total closure instants. Additionally, the results suggest that the set of validation guidelines proposed can be used to standardize the criteria of accuracy and efficiency of the segmentation algorithms.
The present work describes a new procedure to find the region of interest with a high speed laryngeal video sequence. This approach is based on analyzing the average intensity variations both in the columns and in the rows of the images. The first variations to take into account are those arising in each column. The graphics obtained from these variations will resemble to a Gaussian, where the maximum peak will be the column with the most average intensity variation. In order to determine the cut-off points, the data is fitted to a Gaussian distribution; the mean value will be the maximum intensity variation and the tolerance interval will be used to generate the boundaries for the new image. The procedure described above will be repeated by using the new sequence obtained, but for the average intensity variation in the rows. In this way we obtain the region of interest (ROI). The performance, effectiveness and validation of this approach was proved in 18 high speed digital videos, in which the images have an inappropriate closure of the vocal folds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.