Purpose
Dysarthria is a neuromotor speech disorder caused by neuromuscular disturbances that affect one or more articulators resulting in unintelligible speech. Though inter-phoneme articulatory variations are well captured by formant frequency-based acoustic features, these variations are expected to be much higher for dysarthric speakers than normal. These substantial variations can be well captured by placing sensors in appropriate articulatory position. This study focuses to determine a set of articulatory sensors and parameters in order to assess articulatory dysfunctions in dysarthric speech.
Design/methodology/approach
The current work aims to determine significant sensors and parameters associated using motion path and correlation analyzes on the TORGO database of dysarthric speech. Among eight informative sensor channels and six parameters per channel in positional data, the sensors such as tongue middle, back and tip, lower and upper lips and parameters (y, z, φ) are found to contribute significantly toward capturing the articulatory information. Acoustic and positional data analyzes are performed to validate these identified significant sensors. Furthermore, a convolutional neural network-based classifier is developed for both phone-and word-level classification of dysarthric speech using acoustic and positional data.
Findings
The average phone error rate is observed to be lower, up to 15.54% for positional data when compared with acoustic-only data. Further, word-level classification using a combination of both acoustic and positional information is performed to study that the positional data acquired using significant sensors will boost the performance of classification even for severe dysarthric speakers.
Originality/value
The proposed work shows that the significant sensors and parameters can be used to assess dysfunctions in dysarthric speech effectively. The articulatory sensor data helps in better assessment than the acoustic data even for severe dysarthric speakers.