ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414027
|View full text |Cite
|
Sign up to set email alerts
|

An Investigation of End-to-End Models for Robust Speech Recognition

Abstract: End-to-end models for robust automatic speech recognition (ASR) have not been sufficiently well-explored in prior work. With end-to-end models, one could choose to preprocess the input speech using speech enhancement techniques and train the model using enhanced speech. Another alternative is to pass the noisy speech as input and modify the model architecture to adapt to noisy speech. A systematic comparison of these two approaches for end-to-end robust ASR has not been attempted before. We address this gap an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…LibriSpeech: In order to facilitate a fair comparison with existing approaches, the data usage throughout experiments keeps exactly the same as that in [48]. Specifically, we utilize the LibriSpeech [49] real recorded and 1640 simulated noisy utterances, and the test set contains 1320 real recorded and 1320 simulated noisy utterances.…”
Section: A Dataset Descriptionmentioning
confidence: 99%
See 3 more Smart Citations
“…LibriSpeech: In order to facilitate a fair comparison with existing approaches, the data usage throughout experiments keeps exactly the same as that in [48]. Specifically, we utilize the LibriSpeech [49] real recorded and 1640 simulated noisy utterances, and the test set contains 1320 real recorded and 1320 simulated noisy utterances.…”
Section: A Dataset Descriptionmentioning
confidence: 99%
“…Comparison methods: The Baseline in [48] utilizes the Deepspeech2 model [53] for training on the LibriSpeech train-clean-100 dataset with a CTC objective function and evaluates on different test sets. For completeness, the timedomain DEMUCS as a front-end SE step in [48] will be compared, where the enhanced speech is directly used for ASR.…”
Section: A Evaluation Of the Proposed Ew2mentioning
confidence: 99%
See 2 more Smart Citations
“…Although the enhanced speech is greatly improved currently in human hearing [6][7][8][9], the enhanced feature distribution is changed also, which may be helpful for the hearing but not always necessarily beneficial for ASR. In order to make the ASR adaptable to the enhanced feature, the ASR can be retrained with the data processed by the enhanced model [10]. However, the performance of this method is severely affected by the SE effect.…”
Section: Introductionmentioning
confidence: 99%