2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5495567
|View full text |Cite
|
Sign up to set email alerts
|

Non-negative matrix factorization as noise-robust feature extractor for speech recognition

Abstract: We introduce a novel approach for noise-robust feature extraction in speech recognition, based on non-negative matrix factorization (NMF). While NMF has previously been used for speech denoising and speaker separation, we directly extract time-varying features from the NMF output. To this end we extend basic unsupervised NMF to a hybrid supervised/unsupervised algorithm. We present a Dynamic Bayesian Network (DBN) architecture that can exploit these features in a Tandem manner together with the maximum likelih… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
52
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 36 publications
(53 citation statements)
references
References 12 publications
1
52
0
Order By: Relevance
“…As for pre-processing, the audio could be (blindly) separated into multiple sources, such as by non-negative matrix factorisation (NMF) [26,27]. In fact, semi-supervised or supervised NMF-type activation features would allow for another type of higher level audio features such as degree of presence of car events or similar [28]. Further, modelling of temporal context seems promising, such as by long short-term memory recurrent architectures [29] -in particular when routes are analysed.…”
Section: Resultsmentioning
confidence: 99%
“…As for pre-processing, the audio could be (blindly) separated into multiple sources, such as by non-negative matrix factorisation (NMF) [26,27]. In fact, semi-supervised or supervised NMF-type activation features would allow for another type of higher level audio features such as degree of presence of car events or similar [28]. Further, modelling of temporal context seems promising, such as by long short-term memory recurrent architectures [29] -in particular when routes are analysed.…”
Section: Resultsmentioning
confidence: 99%
“…In this work, we consider the KL divergence because it has been recently used with good results in speech processing tasks, such as sound source separation [4], speech enhancement [3] or feature extraction [6]. In order to find a local optimum value for the KL divergence between V and (W H), an iterative scheme with multiplicative update rules can be used as proposed in [8] and stated in (3)…”
Section: Non-negative Matrix Factorization (Nmf)mentioning
confidence: 99%
“…In this sense, the method for speaker separation proposed in [5] introduces a penalty term in the NMF with Euclidean distance that allows to control the sparsity of the solution. However, recent NMF-based techniques in speech processing report better results by using NMF with KL divergence [6], [4]. For this reason, in this paper, we propose a NMF-based method for speech denoising which combines the use of the KL divergence with sparseness constraints following the procedure described in [7].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…models which describe the magnitude spectra of complex sounds as being composed of a purely additive (no negative components) combinations of spectral atoms, have proven to be adept at separating the target speech from interfering sounds such as noise [1,2], other speakers [3,4], music [5,6,7] and even reverberation [8]. For noise-robust automatic speech recognition (ASR), such compositional models really excel when the atoms also have some temporal extent [9,10].…”
Section: Introductionmentioning
confidence: 99%