2013
DOI: 10.1016/j.csl.2012.05.002
|View full text |Cite
|
Sign up to set email alerts
|

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Abstract: This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for contextsensitive Tandem feature extraction and show how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
4
0

Year Published

2014
2014
2017
2017

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 39 publications
1
4
0
Order By: Relevance
“…More recently, Long-Short Term Memory (LSTM) networks [18] were introduced as an alternative RNN architecture that should accommodate a better control over the amount of memory to consider during training. The currently deployed deep LSTM networks seem to show an advantage over feedforward DNNs for phone recognition as well as for LVCSR [19]- [21] and reinforce our claim that recurrent structures are worthwhile to investigate.…”
Section: Introductionsupporting
confidence: 79%
“…More recently, Long-Short Term Memory (LSTM) networks [18] were introduced as an alternative RNN architecture that should accommodate a better control over the amount of memory to consider during training. The currently deployed deep LSTM networks seem to show an advantage over feedforward DNNs for phone recognition as well as for LVCSR [19]- [21] and reinforce our claim that recurrent structures are worthwhile to investigate.…”
Section: Introductionsupporting
confidence: 79%
“…The underlying idea is also in line with the expectation in the human community that one's suggestion can often ameliorate the others' judgement. Moreover, the idea is also inspired by a tandem structure for Automatic Speech Recognition (ASR) [14], where the phoneme predicted by neural networks is considered as an additional attribute for a Gaussian Mixture Model (GMM). In the present paper, we develop an prediction-based learning framework for the regression task of emotion recognition in speech.…”
Section: Introductionmentioning
confidence: 99%
“…To model the context needed for compensating late reverberation, we use deep bidirectional Long Short-Term Memory (LSTM) recurrent neural networks (RNNs), which deliver state-of-the-art performance in ASR [4,22], also in real reverberated and noisy speech [23], and feature enhancement [3]. In the LSTM approach, de-reverberated featuresỹt are computed from a sequence of observed speech featuresxt, t = 1, .…”
Section: Blstm De-noising Autoencodersmentioning
confidence: 99%
“…In this study, we address ASR in reverberant environments with limited amounts of stationary noise. There has been considerable progress in robustness of ASR by databased methods such as training with noisy data from various acoustic environments (multi-condition training), new acoustic modeling techniques such as deep neural networks [1], feature enhancement such as by de-noising auto-encoders [2,3], and combinations of these [4]. However, a problem with such data-based approaches is generalization to acoustic environments which are not known at training time.…”
Section: Introductionmentioning
confidence: 99%