Advances in speech signal analysis facilitate the development of techniques for remote biomedical voice assessment. However, the performance of these techniques is affected by noise and distortion in signals. In this paper, we focus on the vowel /a/ as the most widely-used voice signal for pathological voice assessments and investigate the impact of four major types of distortion that are commonly present during recording or transmission in voice analysis, namely: background noise, reverberation, clipping and compression, on Mel-frequency cepstral coefficients (MFCCs)-the most widely-used features in biomedical voice analysis. Then, we propose a new distortion classification approach to detect the most dominant distortion in such voice signals. The proposed method involves MFCCs as frame-level features and a support vector machine as classifier to detect the presence and type of distortion in frames of a given voice signal. Experimental results obtained from the healthy and Parkinson's voices show the effectiveness of the proposed approach in distortion detection and classification.
The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.
This paper proposes a novel approach for automatic speaker height estimation based on the i-vector framework. In this method, each utterance is modeled by its corresponding ivector. Then artificial neural networks (ANNs) and least-squares support vector regression (LSSVR) are employed to estimate the height of a speaker from a given utterance. The proposed method is trained and tested on the telephone speech signals of National Institute of Standards and Technology (NIST)2008 and 2010 Speaker Recognition Evaluation (SRE) corpora respectively. Evaluation results show the effectiveness of the proposed method in speaker height estimation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.