Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1075
|View full text |Cite
|
Sign up to set email alerts
|

Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

Abstract: Accent variability has posed a huge challenge to automatic speech recognition (ASR) modeling. Although one-hot accent vector based adaptation systems are commonly used, they require prior knowledge about the target accent and cannot handle unseen accents. Furthermore, simply concatenating accent embeddings does not make good use of accent knowledge, which has limited improvements. In this work, we aim to tackle these problems with a novel layer-wise adaptation structure injected into the E2E ASR model encoder.… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 14 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…proposed a MixNet‐based architecture to compensate for phonetic and accent variabilities by using MoE. Gong [8] et al. proposed a layer‐wise adapter in which each expert learns the features of the inputs and finally merges the features from others.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…proposed a MixNet‐based architecture to compensate for phonetic and accent variabilities by using MoE. Gong [8] et al. proposed a layer‐wise adapter in which each expert learns the features of the inputs and finally merges the features from others.…”
Section: Related Workmentioning
confidence: 99%
“…Introduction: As one of the essential technologies of human-computer interaction, end-to-end automatic speech recognition (ASR) has achieved remarkable performance in recent research [1][2][3]. While in real life, either the subjective factors of speakers or the objective environment degrade the performance of ASR [4][5][6][7][8]. This study focuses on one of the subjective factors-accented speech.…”
mentioning
confidence: 99%
See 1 more Smart Citation