Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2847
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual Speech Recognition with Self-Attention Structured Parameterization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(17 citation statements)
references
References 16 publications
0
17
0
Order By: Relevance
“…On the one hand, the implementations often depends heavily on a certain architecture being popular at the time, and the given improvement is going to be diminished when a new architecture evolves. For example, the language-specifically biased attention [5] modified the self-attention architecture [6] specifically based on the assumption that each language can benefit from a bias added to the attention scores. On the other hand, the language-dependent components might require a considerable amount of parameters and struggles to scale to the num-ber of languages.…”
Section: Introductionmentioning
confidence: 99%
“…On the one hand, the implementations often depends heavily on a certain architecture being popular at the time, and the given improvement is going to be diminished when a new architecture evolves. For example, the language-specifically biased attention [5] modified the self-attention architecture [6] specifically based on the assumption that each language can benefit from a bias added to the attention scores. On the other hand, the language-dependent components might require a considerable amount of parameters and struggles to scale to the num-ber of languages.…”
Section: Introductionmentioning
confidence: 99%
“…Although it is simple and maximizes the sharing across languages, it also brings confusion between languages during recognition. If the multilingual model can condition the encoder on the language identity (LID), significant improvement can be obtained over the universal multilingual model without LID because the one-hot LID guides the ASR model to generate the transcription of the target language by reducing the confusion from other languages [127,[129][130][131][132][133]. However, such a multilingual E2E model with LID relies on the prior knowledge of which language the user will speak for every utterance, working more like a monolingual model.…”
Section: Multilingual Modelingmentioning
confidence: 99%
“…For all of these languages, the training and test data are anonymized and transcribed by humans. Similar to [9], we use two forms of data augmentation to mitigate overfitting and improve generalization, namely noise and reverberation based augmentation detailed in [17] and SpecAugment [18].…”
Section: Languages and Datamentioning
confidence: 99%
“…A variety of approaches have been explored for changing the structure of the neural network model to make it more amenable to multilingual modeling. In the context of encoderdecoder models, [5] used adapter layers to account for different amounts of available data per language, [9] parameterized the attention heads of a Transformer-based encoder to be per-language, while [6] showed that a multi-decoder multilingual model, where each decoder is assigned to a cluster of languages, can achieve good performance. Since the introduction of Mixture of Experts (MOE) in [10], these models have found popularity in machine translation [8], and speech recognition [11,12].…”
Section: Introductionmentioning
confidence: 99%