Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-143
|View full text |Cite
|
Sign up to set email alerts
|

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
69
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 184 publications
(69 citation statements)
references
References 35 publications
0
69
0
Order By: Relevance
“…Despite being only half the size of our original model, the nine-layer sub-model significantly outperforms when evaluated on the VoxLingua107 test set. Further, its error rate surpasses the original XLS-R results [2]. Seen in Table 4, this is a result of improved performance across both short (0...5s) and long (5...20s) utterances.…”
Section: Benchmarkingmentioning
confidence: 79%
See 1 more Smart Citation
“…Despite being only half the size of our original model, the nine-layer sub-model significantly outperforms when evaluated on the VoxLingua107 test set. Further, its error rate surpasses the original XLS-R results [2]. Seen in Table 4, this is a result of improved performance across both short (0...5s) and long (5...20s) utterances.…”
Section: Benchmarkingmentioning
confidence: 79%
“…Alongside achieving state-of-the-art (SOTA) accuracy for automatic speech recognition (ASR), such models are notable for high performance in low-resource settings. This versatility becomes more pronounced when pre-training utilizes multilingual corpora, with fine-tuned models achieving superior accuracy in comparison to monolingual counterparts [2,3,4].…”
Section: Introductionmentioning
confidence: 99%
“…The three data types are amenable to being transcribed using ASR, taking the burden of manual transcription off researchers' hands. Freely accessible tools like DeepSpeech (Amodei et al, 2016;Hannun et al, 2014), wav2vec2 (Baevski et al, 2020) and XLS-R (Babu et al, 2021;Baevski et al, 2020;Schneider et al, 2019) include ASR for numerous languages. New tools like cross-lingual models are also making it possible to train models for under-resourced languages.…”
Section: What Data Do We Need?mentioning
confidence: 99%
“…Table VIII shows the results of replacing MFCC features with a set of publicly available neural representations: (i) multilingual bottleneck features, (ii) intermediate representations extracted from HuBERT [47], and (iii) XLS-R [48] which is a scaled up version of XLSR [49]. We include k-means as it has recently been used to cluster HuBERT features for downstream applications that require discrete representations; we set the number of k-means clusters 100 to match the truncation parameter of our Dirichlet process-based models.…”
Section: H Subspace Models With Neural Representationsmentioning
confidence: 99%