Kris Hermus scite author profile

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition (ASR) experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

show abstract

Perceptual audio modeling with exponentially damped sinusoids

Hermus¹,

Verhelst²,

Lemmerling³

et al. 2005

Signal Processing

View full text Add to dashboard Cite

Assessment of signal subspace based speech enhancement for noise robust speech recognition

Hermus

Wambacq

View full text Add to dashboard Cite

Subspace filtering is an extensively studied technique that has been proven very effective in the area of speech enhancement to improve the speech intelligibility. In this paper, we review different subspace estimation techniques (Minimum Variance, Least Squares, Singular Value Adaptation, Time Domain Constrained and Spectral Domain Constrained) in a modified singular value decomposition (SVD) framework, and investigate their capability to improve the noise robustness of speech recognisers. An extensive set of recognition experiments with the Resource Management (RM) database showed that significant reductions in WER can be obtained, both for the white noise and the coloured noise case. Unlike for speech enhancement approaches, we found that no truncation of the noisy signal subspace should be done to optimise the recognition accuracy.

show abstract

Psycho-acoustic modeling of audio with exponentially damped sinusoids

Hermus

Verhelst

Wambacq

2002

View full text Add to dashboard Cite

Estimation of the voicing cut-off frequency contour of natural speech based on harmonic and aperiodic energies

Hermus

Girin

hamme

et al. 2008

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kris Hermus

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Perceptual audio modeling with exponentially damped sinusoids

Assessment of signal subspace based speech enhancement for noise robust speech recognition

Psycho-acoustic modeling of audio with exponentially damped sinusoids

Estimation of the voicing cut-off frequency contour of natural speech based on harmonic and aperiodic energies

Contact Info

Product

Resources

About