Automatic Assessment of Speech Intelligibility using Consonant Similarity for Head and Neck Cancer

Quintas, Sebastião; Mauclair, Julie; Woisard, Virginie; Pinquier, Julien

doi:10.21437/interspeech.2022-182

Cited by 6 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other approaches make use of speech processing technologies based on the extraction of relevant features, such as MFCCs, filterbanks, speaker embeddings, etc. [7,8,9,10]. Nevertheless, it is known that these type of datadriven systems tend to require significant amounts of data in order to operate efficiently [11].…”

Section: Introductionmentioning

confidence: 99%

Towards Reducing Patient Effort for the Automatic Prediction of Speech Intelligibility in Head and Neck Cancers

Quintas

Abad

Mauclair

et al. 2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

The automatic prediction of speech intelligibility can be seen as a growing and relevant alternative to the perceptual evaluations used clinically, which are known to be biased, variant and subjective. We propose an automatic way to regress an intelligibility score based on a recurrent model with a self-attention mechanism. This approach not only presented a high correlation of 0.87 when applied to a pseudo-word task designed for head and neck cancers, but also a significant decrease in error of more than 50%, when compared to previous approaches. Moreover, we have also studied the reliability of the same system when operating with smaller amounts of data at inference time. The results suggest that we can reduce the linguistic sample size to only 30% of the full sample, without losing performance. This aspect validates the reliability of using a smaller subset of data when predicting intelligibility, which can be extremely useful to prevent patient's fatigue by creating smaller batteries of clinical exams.

show abstract

Section: Introductionmentioning

confidence: 99%

Towards Reducing Patient Effort for the Automatic Prediction of Speech Intelligibility in Head and Neck Cancers

Quintas

Abad

Mauclair

et al. 2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…A variety of approaches and methodologies have been used recently to automatically predict clinical perceptual measures, mainly speech intelligibility. Among these approaches different schools of thought can be identified, such as regressing scores based on automatic speech recognition performance (Schuster et al, 2006;Christensen et al, 2012;Fontan et al, 2017) and also the usage of more traditional signal processing techniques and data-driven methodologies (Bin et al, 2019;Quintas et al, 2022Quintas et al, , 2023a. The speaker embedding paradigm, where speech utterances are represented into fixed-dimensional vectors that have discriminating properties among different speakers, has shown interesting gains on general pathological speech assessment (Codosero et al, 2019;Zargarbashi and Babaali, 2019) as well as the specific case of intelligibility prediction (Laaridh et al, 2018;Quintas et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

SAMI: an M-Health application to telemonitor intelligibility and speech disorder severity in head and neck cancers

Quintas,

Vaysse,

Balaguer

et al. 2024

Front. Artif. Intell.

Self Cite

View full text Add to dashboard Cite

Perceptual measures, such as intelligibility and speech disorder severity, are widely used in the clinical assessment of speech disorders in patients treated for oral or oropharyngeal cancer. Despite their widespread usage, these measures are known to be subjective and hard to reproduce. Therefore, an M-Health assessment based on an automatic prediction has been seen as a more robust and reliable alternative. Despite recent progress, these automatic approaches still remain somewhat theoretical, and a need to implement them in real clinical practice rises. Hence, in the present work we introduce SAMI, a clinical mobile application used to predict speech intelligibility and disorder severity as well as to monitor patient progress on these measures over time. The first part of this work illustrates the design and development of the systems supported by SAMI. Here, we show how deep neural speaker embeddings are used to automatically regress speech disorder measurements (intelligibility and severity), as well as the training and validation of the system on a French corpus of head and neck cancer. Furthermore, we also test our model on a secondary corpus recorded in real clinical conditions. The second part details the results obtained from the deployment of our system in a real clinical environment, over the course of several weeks. In this section, the results obtained with SAMI are compared to an a posteriori perceptual evaluation, conducted by a set of experts on the new recorded data. The comparison suggests a high correlation and a low error between the perceptual and automatic evaluations, validating the clinical usage of the proposed application.

show abstract

“…On the other hand, these systems that take in consideration the perception uncertainty normally do so in order to increase the performance metrics on the gold standard, instead of identifying and assessing individual judge profiles that can come across. Nevertheless, despite the recent progress in the automatic prediction of speech intelligibility, score interpretability is still a big issue (Quintas et al., 2022) which typically impairs the widespread clinical usage of these technologies.…”

Section: Introductionmentioning

confidence: 99%

Automatic modelling of perceptual judges in the context of head and neck cancer speech intelligibility

Quintas,

Balaguer,

Mauclair

et al. 2024

Intl J Lang & Comm Disor

Self Cite

View full text Add to dashboard Cite

BackgroundPerceptual measures such as speech intelligibility are known to be biased, variant and subjective, to which an automatic approach has been seen as a more reliable alternative. On the other hand, automatic approaches tend to lack explainability, an aspect that can prevent the widespread usage of these technologies clinically.AimsIn the present work, we aim to study the relationship between four perceptual parameters and speech intelligibility by automatically modelling the behaviour of six perceptual judges, in the context of head and neck cancer. From this evaluation we want to assess the different levels of relevance of each parameter as well as the different judge profiles that arise, both perceptually and automatically.Methods and ProceduresBased on a passage reading task from the Carcinologic Speech Severity Index (C2SI) corpus, six expert listeners assessed the voice quality, resonance, prosody and phonemic distortions, as well as the speech intelligibility of patients treated for oral or oropharyngeal cancer. A statistical analysis and an ensemble of automatic systems, one per judge, were devised, where speech intelligibility is predicted as a function of the four aforementioned perceptual parameters of voice quality, resonance, prosody and phonemic distortions.Outcomes and ResultsThe results suggest that we can automatically predict speech intelligibility as a function of the four aforementioned perceptual parameters, achieving a high correlation of 0.775 (Spearman's ρ). Furthermore, different judge profiles were found perceptually that were successfully modelled automatically.Conclusions and ImplicationsThe four investigated perceptual parameters influence the global rating of speech intelligibility, showing that different judge profiles emerge. The proposed automatic approach displayed a more uniform profile across all judges, displaying a more reliable, unbiased and objective prediction. The system also adds an extra layer of interpretability, since speech intelligibility is regressed as a direct function of the individual prediction of the four perceptual parameters, an improvement over more black box approaches.WHAT THIS PAPER ADDSWhat is already known on this subject Speech intelligibility is a clinical measure typically used in the post‐treatment assessment of speech affecting disorders, such as head and neck cancer. Their perceptual assessment is currently the main method of evaluation; however, it is known to be quite subjective since intelligibility can be seen as a combination of other perceptual parameters (voice quality, resonance, etc.). Given this, automatic approaches have been seen as a more viable alternative to the traditionally used perceptual assessments.What this study adds to existing knowledge The present work introduces a study based on the relationship between four perceptual parameters (voice quality, resonance, prosody and phonemic distortions) and speech intelligibility, by automatically modelling the behaviour of six perceptual judges. The results suggest that different judge profiles arise, both in the perceptual case as well as in the automatic models. These different profiles found showcase the different schools of thought that perceptual judges have, in comparison to the automatic judges, that display more uniform levels of relevance across all the four perceptual parameters. This aspect shows that an automatic approach promotes unbiased, reliable and more objective predictions.What are the clinical implications of this work? The automatic prediction of speech intelligibility, using a combination of four perceptual parameters, show that these approaches can achieve high correlations with the reference scores while maintaining a certain degree of explainability. The more uniform judge profiles found on the automatic case also display less biased results towards the four perceptual parameters. This aspect facilitates the clinical implementation of this class of systems, as opposed to the more subjective and harder to reproduce perceptual assessments.

show abstract

Automatic Assessment of Speech Intelligibility using Consonant Similarity for Head and Neck Cancer

Cited by 6 publications

References 24 publications

Towards Reducing Patient Effort for the Automatic Prediction of Speech Intelligibility in Head and Neck Cancers

Towards Reducing Patient Effort for the Automatic Prediction of Speech Intelligibility in Head and Neck Cancers

SAMI: an M-Health application to telemonitor intelligibility and speech disorder severity in head and neck cancers

Automatic modelling of perceptual judges in the context of head and neck cancer speech intelligibility

Contact Info

Product

Resources

About