2021
DOI: 10.48550/arxiv.2104.08726
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

Abstract: Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 indigenous languages of the Americas. We co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…Most of these languages are spoken by millions of people, despite being considered low-resource in the research community. (Ebrahimi et al, 2021) 10 ALT (Riza et al, 2016) 13 Europarl (Koehn, 2005) 21 TICO-19 (Anastasopoulos et al, 2020) 36 OPUS-100 (Zhang et al, 2020) 100 M2M 100…”
Section: Languages In Flores-101mentioning
confidence: 99%
See 1 more Smart Citation
“…Most of these languages are spoken by millions of people, despite being considered low-resource in the research community. (Ebrahimi et al, 2021) 10 ALT (Riza et al, 2016) 13 Europarl (Koehn, 2005) 21 TICO-19 (Anastasopoulos et al, 2020) 36 OPUS-100 (Zhang et al, 2020) 100 M2M 100…”
Section: Languages In Flores-101mentioning
confidence: 99%
“…At present, there are very few benchmarks on low-resource languages. These often have very low coverage of low-resource languages (Riza et al, 2016;Thu et al, 2016;Barrault et al, 2020b;∀ et al, 2020;Ebrahimi et al, 2021;Kuwanto et al, 2021), limiting our understanding of how well methods generalize and scale to a larger number of languages with a diversity of linguistic features. There are some benchmarks that have high coverage, but these are often in specific domains, like COVID-19 (Anastasopoulos et al, 2020) or religious texts (Christodouloupoulos and Steedman, 2015;Malaviya et al, 2017;Tiedemann, 2018;Agić and Vulić, 2019); or have low quality because they are built using automatic approaches (Zhang et al, 2020;Schwenk et al, 2019Schwenk et al, , 2021.…”
Section: Introductionmentioning
confidence: 99%