Noise reduction for noise robust feature extraction for distributed speech recognition

Noe, B.; Sienel, Juergen; Jouvet, Denis; Mauuary, Laurent; Boves, L.W.J.; Veth, J.M. de; Wet, Febe de

doi:10.21437/eurospeech.2001-116

Cited by 13 publications

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Extraction of Speaker Features from Different Stages of DSR Front-Ends for Distributed Speaker Verification

Mak

Sit

Kung

2005

Int J Speech Technol

View full text Add to dashboard Cite

The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic distortion caused by communication channels can be avoided. This paper investigates the effect of extracting spectral features from different stages of the front-end processing on the performance of distributed speaker verification systems. A technique that combines handset selectors with stochastic feature transformation is also employed in a back-end speaker verification system to reduce the acoustic mismatch between different handsets. Because the feature vectors obtained from the back-end server are vector quantized, the paper proposes two approaches to adding Gaussian noise to the quantized feature vectors for training the Gaussian mixture speaker models. In one approach, the variances of the Gaussian noise are made dependent on the codeword distance. In another approach, the variances are a function of the distance between some unquantized training vectors and their closest code vector. The HTIMIT corpus was * Correspondence should be sent to M.W. Mak, Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong. Email: enmwmak@polyu.edu.hk. Tel: (852)27666257. Fax: (852)23628439. 1 used in the experiments and results based on 150 speakers show that stochastic feature transformation can be added to the back-end server for compensating transducer distortion. It is also found that better verification performance can be achieved when the LMS-based blind equalization in the standard is replaced by stochastic feature transformation.

show abstract