2018
DOI: 10.48550/arxiv.1810.03459
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…In the MLASR transfer learning scenario, the base MLASR model was trained exactly same as in [25]. The base model was trained using 10 selected Babel languages, which are roughly 640 hours of data: Cantonese, Bengali, Pashto, Turkish, Vietnamese, Haitian, Tamil, Kurmanji, Tokpisin, and Georgian.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…In the MLASR transfer learning scenario, the base MLASR model was trained exactly same as in [25]. The base model was trained using 10 selected Babel languages, which are roughly 640 hours of data: Cantonese, Bengali, Pashto, Turkish, Vietnamese, Haitian, Tamil, Kurmanji, Tokpisin, and Georgian.…”
Section: Methodsmentioning
confidence: 99%
“…The model parameters were then fine-tuned using all Swahili corpus in Babel, which is about 40 hours. During the transfer process, we used the same MLASR base model with three different ways: 2-stage transfer (see [25] for more details), cold fusion, and cell control fusion 3 (affine). We included cold fusion in this comparison since cold fusion showed its effectiveness in domain adaptation in [17].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…As [15] has shown, the architecture we employ adheres to the latency constraints required for interactive applications. In constrast, prior E2E multilingual work has been limited to attention-based models that do not admit a straightforward streaming implementation [10][11][12][13].…”
Section: *Equal Contributionmentioning
confidence: 99%
“…More recently, end-to-end (E2E) multilingual systems have gained traction as a way to further simplify the training and serving of such models. These models replace the acoustic, pronunciation, and language models of n different languages with a single model while continuing to show improved performance over monolingual E2E systems [10][11][12][13]. Even as these E2E systems have shown promising results, it has not been conclusively demonstrated that they can be competitive with state-ofthe-art conventional models, nor that they can do so while still operating within the real-time constraints of interactive applications such as a speech-enabled assistant.…”
Section: Introductionmentioning
confidence: 99%