2022 IEEE Spoken Language Technology Workshop (SLT) 2023
DOI: 10.1109/slt54892.2023.10023141
|View full text |Cite
|
Sign up to set email alerts
|

FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 57 publications
(28 citation statements)
references
References 31 publications
0
28
0
Order By: Relevance
“…1. Speech Recognition: We evaluated on the test set of SpeechStew ASR [24], VoxPopuli ASR [25], and FLEURS ASR [26]. The performance was computed in terms of word error rate (WER) using the JiWER implementation [27].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…1. Speech Recognition: We evaluated on the test set of SpeechStew ASR [24], VoxPopuli ASR [25], and FLEURS ASR [26]. The performance was computed in terms of word error rate (WER) using the JiWER implementation [27].…”
Section: Discussionmentioning
confidence: 99%
“…We present the results of ASR evaluation in Table 1 on an English corpus using SpeechStew [24], as well as on multilingual corpora using Voxpopuli [25] and FLEURS [26]. The instructions during evaluation are similar to the ones in training (e.g., "Recognize this speech in {lang}").…”
Section: Speech Recognitionmentioning
confidence: 99%
“…Experiments on MuST-C (Cattoni et al, 2021) reveal that our method is not only better than existing zero-shot models, by a large margin, but also surpasses supervised ones, achieving state-of-the-art results. On CoVoST (Wang et al, 2020b), ZEROSWOT outperforms the original version of the multimodal SEAMLESSM4T (Seamless-Communication, 2023a), while evaluations on the 88 target languages of FLEURS (Conneau et al, 2023) showcase the massively multilingual capacity of our method. ZEROSWOT is also vastly superior to comparable CTC-based cascade ST models, and while it is on par with cascades that utilize strong attention-based encoder-decoder ASR models, it ranks better in terms of efficiency.…”
Section: Introductionmentioning
confidence: 84%
“…All these evaluations for speech were performed on FLEURS test set (Conneau et al, 2023), a N -way parallel speech dataset in 102 languages built on top of the text FLORES-101 benchmark.…”
Section: Evaluationsmentioning
confidence: 99%