Assessing the severity level of dysarthria can provide an insight into the patient's improvement, assist pathologists to plan therapy, and aid automatic dysarthric speech recognition systems. In this article, we present a comparative study on the classification of dysarthria severity levels using different deep learning techniques and acoustic features. First, we evaluate the basic architectural choices such as deep neural network (DNN), convolutional neural network, gated recurrent units and long short-term memory network using the basic speech features, namely, Mel-frequency cepstral coefficients (MFCCs) and constant-Q cepstral coefficients. Next, speech-disorder specific features computed from prosody, articulation, phonation and glottal functioning are evaluated on DNN models. Finally, we explore the utility of low-dimensional feature representation using subspace modeling to give i-vectors, which are then classified using DNN models. Evaluation is done using the standard UA-Speech and TORGO databases. By giving an accuracy of 93.97% under the speaker-dependent scenario and 49.22% under the speaker-independent scenario for the UA-Speech database, the DNN classifier using MFCC-based i-vectors outperforms other systems.
This paper presents the design, the development of a new multilingual emotional speech corpus, TaMaR-EmoDB (Tamil Malayalam Ravula-Emotion DataBase) and its evaluation using a deep neural network (DNN)-baseline system. The corpus consists of utterances from three languages, namely, Malayalam, Tamil and Ravula, a tribal language. The database consists of short speech utterances in four emotions-anger, anxiety, happiness, and sadness, along with neutral utterances. The subset of the corpus is first evaluated using a perception test, in order to understand how well the emotional state in emotional speech is identified by humans. Later, machine testing is performed using the fusion of spectral and prosodic features with DNN framework. During the classification phase, the system reports an average precision of 0.78, 0.60, 0.61 and recall of 0.84, 0.61 and 0.53 for Malayalam, Tamil, and Ravula, respectively. This database can potentially be used as a new linguistic resource that will enable future research in speech emotion detection, corpus-based prosody analysis, and speech synthesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.