2021
DOI: 10.48550/arxiv.2111.08137
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Joint Unsupervised and Supervised Training for Multilingual ASR

Abstract: Self-supervised training has shown promising gains in pretraining models and facilitating the downstream finetuning for speech recognition, like multilingual ASR. Most existing methods adopt a 2-stage scheme where the self-supervised loss is optimized in the first pretraining stage, and the standard supervised finetuning resumes in the second stage. In this paper, we propose an end-to-end (E2E) Joint Unsupervised and Supervised Training (JUST) method to combine the supervised RNN-T loss and the self-supervised… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 18 publications
1
1
0
Order By: Relevance
“…Interestingly, with more finetune data, BEST-RQ perform even better than w2v-BERT, especially for pt and pl. Our results also comparable with previously stateof-the-art results in (Bai et al, 2021) which conduct joint training for multilingual ASR.…”
Section: Results On Mls-10hrssupporting
confidence: 89%
See 1 more Smart Citation
“…Interestingly, with more finetune data, BEST-RQ perform even better than w2v-BERT, especially for pt and pl. Our results also comparable with previously stateof-the-art results in (Bai et al, 2021) which conduct joint training for multilingual ASR.…”
Section: Results On Mls-10hrssupporting
confidence: 89%
“…As our self-supervised learning algorithm eliminates the requirement of representation learning through applying a random-projection quantizer, it is crucial to understand the representation quality of this quantizer and how the quality of the quantization affect the self-supervised learning. We (Bai et al, 2021) 6.6 4.3 9.9 5.0 3.8 9.1 14.6 8.1 7.8 JUST (Bai et al, 2021) (co-train) 6.5 4.1 9.5 5.2 3.7 8.8 8.0 6.6 6.5 w2v-BERT (0.6B) 5.5 4.3 10.9 5.6 4.5 10.1 13.4 11.2 8.2 BEST-RQ (Ours, 0.6B) 6.8 4.1 9.7 5.0 4.9 7.4 9.4 5.2 6.6…”
Section: Analyzing Quantization Qualitymentioning
confidence: 99%