2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178819
|View full text |Cite
|
Sign up to set email alerts
|

Exemplar-based speech enhancement for deep neural network based automatic speech recognition

Abstract: Deep neural network (DNN) based acoustic modelling has been successfully used for a variety of automatic speech recognition (ASR) tasks, thanks to its ability to learn higher-level information using multiple hidden layers. This paper investigates the recently proposed exemplar-based speech enhancement technique using coupled dictionaries as a pre-processing stage for DNN-based systems. In this setting, the noisy speech is decomposed as a weighted sum of atoms in an input dictionary containing exemplars sampled… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 23 publications
(19 citation statements)
references
References 17 publications
0
19
0
Order By: Relevance
“…Recently, deep-learning-based SE approaches have received increased attention and it has been confirmed that they yield better performances than traditional methods in many tasks [10,11,12]. Because of the deep structure, the deep-learning-based models can effectively characterize the complex transformation of noisy speech to clean speech, or they can precisely estimate a mask to filter out noise components from the noisy speech.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, deep-learning-based SE approaches have received increased attention and it has been confirmed that they yield better performances than traditional methods in many tasks [10,11,12]. Because of the deep structure, the deep-learning-based models can effectively characterize the complex transformation of noisy speech to clean speech, or they can precisely estimate a mask to filter out noise components from the noisy speech.…”
Section: Introductionmentioning
confidence: 99%
“…While Huang et al applied a simple CNN on the FBANK features, others applied either a sophisticated feature extraction method [12,136] or a refined CNN architecture [32]. We see that the ARMA feature set outperforms both these approaches by a large margin.…”
Section: Results With Mel-spectral Featuresmentioning
confidence: 83%
“…In the latter comparison, it is also noteworthy that while Martinez et al reported a training time of 4 days for their DNNs on an NVIDIA Tesla K20C GPU, our neural net training times were below 12 hours on a GPU of equal performance (GeForce GTX 770) [1]. Lastly, the inclusion of Delta-like coefficients meant that the proposed framework could also attain lower WERs than those reported by Baby et al using a sophisticated speech enhancement technique [12].…”
Section: Experiments On Clean Speech Using the Timit Corpusmentioning
confidence: 90%
See 2 more Smart Citations