In the context of pathological speech, perceptual evaluation is still the most widely used method for intelligibility estimation. Despite being considered a staple in clinical settings, it has a well-known subjectivity associated with it, which results in greater variances and low reproducibility. On the other hand, due to the increasing computing power and latest research, automatic evaluation has become a growing alternative to perceptual assessments. In this paper we investigate an automatic prediction of speech intelligibility using the x-vector paradigm, in the context of head and neck cancer. Experimental evaluation of the proposed model suggests a high correlation rate when applied to our corpus of HNC patients (p = 0.85). Our approach also displayed the possibility of achieving very high correlation values (p = 0.95) when adapting the evaluation to each individual speaker, displaying a significantly more accurate prediction whilst using smaller amounts of data. These results can also provide valuable insight to the redevelopment of test protocols, which typically tend to be substantial and effort-intensive for patients.
The automatic prediction of speech intelligibility can be seen as a growing and relevant alternative to the perceptual evaluations used clinically, which are known to be biased, variant and subjective. We propose an automatic way to regress an intelligibility score based on a recurrent model with a self-attention mechanism. This approach not only presented a high correlation of 0.87 when applied to a pseudo-word task designed for head and neck cancers, but also a significant decrease in error of more than 50%, when compared to previous approaches. Moreover, we have also studied the reliability of the same system when operating with smaller amounts of data at inference time. The results suggest that we can reduce the linguistic sample size to only 30% of the full sample, without losing performance. This aspect validates the reliability of using a smaller subset of data when predicting intelligibility, which can be extremely useful to prevent patient's fatigue by creating smaller batteries of clinical exams.
The automatic prediction of speech intelligibility is a widely known problem in the context of pathological speech. It has been seen as a growing and viable alternative to perceptual evaluation, which is typically time-consuming, highly subjective and strongly biased. Due to this, the development of automatic systems that are able to output not only unbiased predictions, but also interpretable scores become relevant. In this paper we investigate a method to predict speech intelligibility based on consonant phonetic similarity. The proposed methodology relies on a siamese network to compute similarity scores between healthy and pathological phonemes, and based on the combination of those scores, regresses the intelligibility values. Our experimental evaluation suggests a high baseline correlation value of p = 0.82, when applied to our corpus of head and neck cancer. Moreover, further conditioning of the system on specific phonemes in key contexts increased the correlation up to p = 0.89. The given methodology also aims to promote interpretability of the predicted intelligibility score, which is highly relevant in a clinical setting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.