Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-3052
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual Speech Recognition with Corpus Relatedness Sampling

Abstract: Multilingual acoustic models have been successfully applied to low-resource speech recognition. Most existing works have combined many small corpora together, and pretrained a multilingual model by sampling from each corpus uniformly. The model is eventually fine-tuned on each target corpus. This approach, however, fails to exploit the relatedness and similarity among corpora in the training set. For example, the target corpus might benefit more from a corpus in the same domain or a corpus from a close languag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 37 publications
0
14
0
Order By: Relevance
“…Burget et al [10] shares the parameters of a Gaussian Mixture Model. Closer to our work, several works have shared the parameters of a neural network encoder, using feedforward networks [3,1,2] or LSTM [11]. The model is then finetuned on the target low-resource language to fit its specificities [12].…”
Section: Multilingual Pre-training For Speech Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…Burget et al [10] shares the parameters of a Gaussian Mixture Model. Closer to our work, several works have shared the parameters of a neural network encoder, using feedforward networks [3,1,2] or LSTM [11]. The model is then finetuned on the target low-resource language to fit its specificities [12].…”
Section: Multilingual Pre-training For Speech Recognitionmentioning
confidence: 99%
“…The model is then finetuned on the target low-resource language to fit its specificities [12]. The sampling of the languages during the pre-training can focus on languages related to the targeted language [11]. Another approach is to encourage a language-independent encoder with an adversarial loss [13].…”
Section: Multilingual Pre-training For Speech Recognitionmentioning
confidence: 99%
“…We train the acoustic model with stochastic gradient descent, using a learning rate of 0.005. In each iteration, we apply the uniform sampling (Li et al 2019): first randomly select a corpus from the entire training set, and then randomly choose one batch from that corpus. Our baseline model is the multilingual acoustic model with a shared phoneme inventory.…”
Section: Experimental Settingsmentioning
confidence: 99%
“…We associate languages from two datasets with ISO639-3 code and extract 676 languages. To recognize phones of various languages, a language-independent phone recognition model is required because any language-dependent systems could only discover phones of that specific language [16]. Additionally, languagedependent systems could not distinguish allophones which might be crucial in other languages [17].…”
Section: Phone Recognitionmentioning
confidence: 99%