2019
DOI: 10.48550/arxiv.1909.05330
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
31
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(32 citation statements)
references
References 0 publications
0
31
0
1
Order By: Relevance
“…Datta et al [21] unified different writing systems through a many-to-one transliteration transducer. Recently, large-scale multilingual ASR systems have been investigated [3], [4], [6], [10], [22]. Pratap et al [3] proposed jointly training on 16,000 hours speech data of 51 languages with up to 1 billion parameters.…”
Section: A Multilingual and Cross-lingual Speech Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…Datta et al [21] unified different writing systems through a many-to-one transliteration transducer. Recently, large-scale multilingual ASR systems have been investigated [3], [4], [6], [10], [22]. Pratap et al [3] proposed jointly training on 16,000 hours speech data of 51 languages with up to 1 billion parameters.…”
Section: A Multilingual and Cross-lingual Speech Recognitionmentioning
confidence: 99%
“…Some researchers have proposed to apply the Adapters to the E2E ASR tasks. In [10], Kannan et al proposes to use the adapters to handle the data imbalance problem for large-scale multilingual ASR. After obtaining the model trained on the union of data from all languages, they trained the languagedependent adapters on each of the languages again so that the multilingual backbone shares information across languages while the adapters could allow for per-language specialization.…”
Section: B Adaptersmentioning
confidence: 99%
See 1 more Smart Citation
“…In the world of speech recognition, training a single recognizer for multiple languages is not a thematic stranger [3] from Hidden Markov Model (HMM) based models [17,18], hybrid models [19] to end-to-end neural based models with CTC [20,21] or sequence-to-sequence models [22,5,23,24,25,26], with the last approach being inspired by the success of multilingual machine translation [1,2]. The literature especially mentions the merits of disclosing the language identity (when the utterance is supposed to belong to a single language) to the model, whose architecture is designed to incorporate the language information.…”
Section: Related Work and Comparisonmentioning
confidence: 99%
“…Training a single ASR model to support multiple languages is promising and challenging [20,21,22,23]. Through shared learning of model parameters across languages [24,25,26], multilingual ASR models can perform better than monolingual models, particularly for those languages with less data.…”
Section: Multilingual Speech Recognitionmentioning
confidence: 99%