2021
DOI: 10.48550/arxiv.2105.03010
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient Weight factorization for Multilingual Speech Recognition

Abstract: End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages, resulting in a single neural network to handle transcribing different languages. Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously. In this paper we propose a novel multilingual architecture that targets the core operation in neural networks: linear tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
5
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 35 publications
1
5
0
Order By: Relevance
“…We also provided further analysis to the architecture, by showing that there are benefits to either improving the self-attention mechanism by adding relative positions during fine-tuning, or stacking the MBART encoders to the wav2vec counterpart. This is the continuation of the line of work building multilingual ASR systems based on sequence-to-sequence neural networks [17,5,16]. Our work is available for public at https://github.com/quanpn90/NMTGMinor providing a highly CUDA-optimized implementation for both wav2vec and MBART which is potentially useful for the community.…”
Section: Introductionmentioning
confidence: 87%
See 4 more Smart Citations
“…We also provided further analysis to the architecture, by showing that there are benefits to either improving the self-attention mechanism by adding relative positions during fine-tuning, or stacking the MBART encoders to the wav2vec counterpart. This is the continuation of the line of work building multilingual ASR systems based on sequence-to-sequence neural networks [17,5,16]. Our work is available for public at https://github.com/quanpn90/NMTGMinor providing a highly CUDA-optimized implementation for both wav2vec and MBART which is potentially useful for the community.…”
Section: Introductionmentioning
confidence: 87%
“…The motivation comes from the disbelief that there are features being shared between languages and at the same time each language requires to selectively represented, and networks are encouraged to change "modes" depending on the language being processed [23]. Since then, multingual model designers opt to use specific network components being presented for each language, ranging from weight generator [24] to adapters [15,22] or recently adaptive weights adding scales and biases to each weight matrix in the whole architecture [16]. In this paper, the last two options are selected for investigation thanks to being computationally manageable.…”
Section: Language Adaptive Componentsmentioning
confidence: 99%
See 3 more Smart Citations