Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1631
|View full text |Cite
|
Sign up to set email alerts
|

Estimation of Hypernasality Scores from Cleft Lip and Palate Speech

Abstract: Hypernasality refers to the perception of excessive nasal resonances in vowels and voiced consonants. Existing speech processing based approaches concentrate only on the classification of speech into normal or hypernasal, which do not give the degree of hypernasality in terms of continuous values like nasometer. Motivated by the functionality of nasometer, in this work, a method is proposed for the evaluation of hypernasality. Speech signals representing two extremely opposite cases of nasality are used to dev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 10 publications
0
16
0
Order By: Relevance
“…This suggests that these features are a robust measure of hypernasality, relatively invariant to the disease-specific co-modulating variables that hinder the performance of the baselines on the same task. The nasalization features in the NAP, by virtue of being trained on a large corpus of healthy speech, and targeting a specific perceptual quality are simultaneously more robust to both the diseasespecific overfitting expected from NN methods such as [55] and speaker-to-speaker variances discussed in the design of the formant-based A1P0 and related features in [66], [68]. Articulatory precision features are robust in a similar way.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…This suggests that these features are a robust measure of hypernasality, relatively invariant to the disease-specific co-modulating variables that hinder the performance of the baselines on the same task. The nasalization features in the NAP, by virtue of being trained on a large corpus of healthy speech, and targeting a specific perceptual quality are simultaneously more robust to both the diseasespecific overfitting expected from NN methods such as [55] and speaker-to-speaker variances discussed in the design of the formant-based A1P0 and related features in [66], [68]. Articulatory precision features are robust in a similar way.…”
Section: Discussionmentioning
confidence: 99%
“…Mel-frequency cepstral coefficients (MFCCs) and other spectral transformations [37], [38], [36], [39], [40], [41], [42], [43], [44], glottal source related features (jitter and shimmer) [45], [46], difference between the low-pass and bandpass profile of the Teager Energy Operator (TEO) [47], [48], and non-linear features [49], [50] have all been proposed as model input features. Gaussian mixture models (GMM), support vector machines, and deep neural networks have been used in conjunction with these features for hypernasality evaluation from word and sentence level data [51], [52], [53], [54]. Recently, end-to-end neural networks taking MFCC frames as input and producing hypernasality assessments as output have also been proposed [55].…”
Section: A Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…A very similar approach of using i-vector for regression task has been adopted in both [43] and [44]. GMM and DNN based models trained using MFCC features were employed for the task of hypernasality estimation in [45]. Two acoustic models trained on a large corpus of healthy speech, one to measure the nasal resonance from voiced sounds and another to measure the articulatory imprecision from unvoiced sounds was used in [46] to estimate hypernasality in dysarthric subjects.…”
Section: Reference-based Approachesmentioning
confidence: 99%