Hypernasality refers to the perception of excessive nasal resonances in vowels and voiced consonants. Existing speech processing based approaches concentrate only on the classification of speech into normal or hypernasal, which do not give the degree of hypernasality in terms of continuous values like nasometer. Motivated by the functionality of nasometer, in this work, a method is proposed for the evaluation of hypernasality. Speech signals representing two extremely opposite cases of nasality are used to develop the acoustic models, where oral sentences (rich in vowels, stops, and fricatives) of normal speakers and nasal sentences (rich in nasals and nasalized vowels) of moderate-severe hypernasal speakers represent the groups with minimum and maximum attainable degrees of nasality, respectively. The acoustic features derived from glottal activity regions are used to model the maximum and minimum nasality classes using Gaussian mixture model and deep neural network approaches. The posterior probabilities obtained for nasal sentence class are referred to as hypernasality scores. The scores show a significant correlation (p < 0.01) with respect to perceptual ratings of hypernasality, provided by expert speechlanguage pathologists. Further, hypernasality scores are used for the detection of hypernasality, and the results are compared with the nasometer based approach.
The present work explores the acoustic characteristics of articulatory deviations near g(lottis) landmarks to derive the correlates of cleft lip and palate speech intelligibility. The speech region around the g landmark is used to compute two different acoustic features, namely, two-dimensional discrete cosine transform based joint spectro-temporal features, and Mel-frequency cepstral coefficients. Sentence-specific acoustic models are built using these features extracted from the normal speakers' group. The mean log-likelihood score for each test utterance is computed and tested as the acoustic correlates of intelligibility. Derived intelligibility measure shows significant correlation (ρ = 0.78, p < 0.001) with the perceptual ratings.
This paper deals with the problem of detecting replay attacks on speaker verification systems. In literature, apart from the acoustic features, source features have also been successfully used for this task. In existing source features, only the information around glottal closure instants (GCIs) have been utilized. We hypothesize that the feature derived by capturing the temporal dynamics between two GCIs would be more discriminative for such task. Motivated by that, in this work we explore the use of discrete cosine transform compressed integrated linear prediction residual (ILPR) features for discriminating between genuine and replayed signals. A spoof detection system is built using the compressed ILPR feature and a Gaussian mixture model (GMM) classifier. A baseline system is also built using constant-Q cepstral coefficient feature with GMM backend. These systems are tested on the ASVSpoof 2017 Version 2.0 database. On fusing the systems developed using acoustic and proposed source features an equal error rate of 9.41% is achieved on the evaluation set.
Intelligibility is considered as one of the primary measures for speech rehabilitation of individuals with a cleft lip and palate (CLP). Currently, speech processing and machine-learning-based objective methods are gaining more research interest as a way to quantify speech intelligibility. In this work, joint spectro-temporal features computed from a time–frequency representation of speech are explored to derive speech representations based on Gaussian posteriograms. A comparative framework using dynamic time warping (DTW) is used to quantify the intelligibility of child CLP speech. The DTW distance is used to score sentence-level intelligibility and tested for correlation with perceptual intelligibility ratings obtained from expert speech-language pathologists. A baseline DTW system using the conventional Mel-frequency cepstral coefficients (MFCCs) is also developed to compare the performance of the proposed system. Spearman's rank correlation coefficient between the objective intelligibility scores and the perceptual intelligibility rating is studied. A Williams significance test is conducted to assess the statistical significance of the correlation difference between the methods. The results show that the system based on joint spectro-temporal features significantly outperforms the MFCC-based system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.