This paper evaluates the accuracy of different characterization methods for the automatic detection of multiple speech disorders. The speech impairments considered include dysphonia in people with Parkinson's disease (PD), dysphonia diagnosed in patients with different laryngeal pathologies (LP), and hypernasality in children with cleft lip and palate (CLP). Four different methods are applied to analyze the voice signals including noise content measures, spectral-cepstral modeling, nonlinear features, and measurements to quantify the stability of the fundamental frequency. These measures are tested in six databases: three with recordings of PD patients, two with patients with LP, and one with children with CLP. The abnormal vibration of the vocal folds observed in PD patients and in people with LP is modeled using the stability measures with accuracies ranging from 81% to 99% depending on the pathology. The spectral-cepstral features are used in this paper to model the voice spectrum with special emphasis around the first two formants. These measures exhibit accuracies ranging from 95% to 99% in the automatic detection of hypernasal voices, which confirms the presence of changes in the speech spectrum due to hypernasality. Noise measures suitably discriminate between dysphonic and healthy voices in both databases with speakers suffering from LP. The results obtained in this study suggest that it is not suitable to use every kind of features to model all of the voice pathologies; conversely, it is necessary to study the physiology of each impairment to choose the most appropriate set of features.
Abstract-Parkinson's disease (PD) is a neurodegenerative disorder that is characterized by the loss of dopaminergic neurons in the mid brain. It is demonstrated that about 90% of the people with PD also develop speech impairments, exhibiting symptoms such as monotonic speech, low pitch intensity, inappropriate pauses, imprecision in consonants and problems in prosody; although they are already identify problems, only 3% to 4% of the patients receive speech therapy. The research community has addressed the problem of the automatic detection of PD by means of noise measures; however, in such works only the phonation of the English vowel /a/ has been considered. In this paper, the five Spanish vowels uttered by 50 people with PD and 50 healthy controls (HC) are evaluated automatically considering a set of four noise measures: Harmonics to Noise Ratio (HNR), Normalized Noise Energy (NNE), Cepstral HNR (CHNR) and Glottal to Noise Excitation Ratio (GNE). The decision on whether a speech recording is from a person with PD or from a HC is taken by a K nearest neighbors (k-NN) classifier, finding an accuracy of 66.57% when only the vowel /i/ is considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.