It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.
The phoneme intelligibility scores of dysarthric speakers obtained by the three investigated intelligibility model types are reliable. The highest correlation between the perceptual and objective intelligibility scores was found for models combining phonemic and phonological features. The intelligibility scoring system is now ready to be implemented in a clinical tool.
Nowadays, intelligibility is a popular measure of the severity of the articulatory deficiencies of a pathological speaker. Usually, this measure is obtained by means of a perceptual test, consisting of nonconventional and/or nonconnected words. In previous work, we developed a system incorporating two Automatic Speech Recognizers (ASR) that could fairly accurately estimate phoneme intelligibility (PI). In the present paper, we propose a novel method that aims to assess the running speech intelligibility (RSI) as a more relevant indicator of the communication efficiency of a speaker in a natural setting. The proposed method computes a phonological characterization of the speaker by means of a statistical analysis of frame-level phonological features. Important is that this analysis requires no knowledge of what the speaker was supposed to say. The new characterization is demonstrated to predict PI and to provide valuable information about the nature and severity of the pathology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.