2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8462161
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Adaptation for Multichannel End-to-End Speech Recognition

Abstract: The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
106
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 94 publications
(106 citation statements)
references
References 31 publications
0
106
0
Order By: Relevance
“…We used a similar architecture as in [23], where the masking network and the neural beamformer are integrated into an attentionbased encoder-decoder neural network, and the whole model is jointly optimized solely via a speech recognition objective. The input of the model can consist of an arbitrary number of channels C, and its output is the text sequence for each speaker directly.…”
Section: End-to-end Multi-channel Multi-speaker Asrmentioning
confidence: 99%
“…We used a similar architecture as in [23], where the masking network and the neural beamformer are integrated into an attentionbased encoder-decoder neural network, and the whole model is jointly optimized solely via a speech recognition objective. The input of the model can consist of an arbitrary number of channels C, and its output is the text sequence for each speaker directly.…”
Section: End-to-end Multi-channel Multi-speaker Asrmentioning
confidence: 99%
“…There have been few efforts on the adaptation of the E2E systems. [22] proposed a multi-path adaptation scheme for end-to-end multichannel ASR. In [23], the authors addressed the data sparsity issue by formulating Kullback-Leibler divergence (KLD) regularization and multi-task learning approaches for speaker adaptation of CTC models.…”
Section: Related Workmentioning
confidence: 99%
“…Although it achieved excellent performance as the front-end of ASR [8][9][10], it has a few drawbacks. First, the DNN was often trained to minimize the estimation error of T-F masks instead of maximizing the quality of the estimated signal directly [11,12]. In addition, its performance is limited under noisy and reverberant environment because it does not consider non-stationary characteristics of speech signal [13].…”
Section: Introductionmentioning
confidence: 99%