2022
DOI: 10.48550/arxiv.2205.12446
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

Abstract: We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks base… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(8 citation statements)
references
References 27 publications
0
8
0
Order By: Relevance
“…• Automatic Speech Recognition (ASR): We use YouTube data to train USMs for YouTube (e.g., closed captions). We evaluate the USMs on two public benchmarks, SpeechStew [2] and FLEURS [16]. We also report results on the long-form test set CORAAL [17] for which only the evaluation set is available.…”
Section: Supervised Asr Trainingmentioning
confidence: 99%
See 4 more Smart Citations
“…• Automatic Speech Recognition (ASR): We use YouTube data to train USMs for YouTube (e.g., closed captions). We evaluate the USMs on two public benchmarks, SpeechStew [2] and FLEURS [16]. We also report results on the long-form test set CORAAL [17] for which only the evaluation set is available.…”
Section: Supervised Asr Trainingmentioning
confidence: 99%
“…SoTA results for downstream multilingual speech tasks: Our USM models achieve state-of-theart performance for multilingual ASR and AST for multiple datasets in multiple domains. This includes SpeechStew (mono-lingual ASR) [2], CORAAL (African American Vernacular English (AAVE) ASR) [17], FLEURS (multi-lingual ASR) [16], YT (multilingual long-form ASR), and CoVoST (AST from English to multiple languages). We depict our model's performance in the first panel of Fig.…”
Section: Key Findingsmentioning
confidence: 99%
See 3 more Smart Citations