Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.185
|View full text |Cite
|
Sign up to set email alerts
|

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

Abstract: In order to simulate human language capacity, natural language processing systems must be able to reason about the dynamics of everyday situations, including their possible causes and effects. Moreover, they should be able to generalise the acquired world knowledge to new languages, modulo cultural differences. Advances in machine reasoning and cross-lingual transfer depend on the availability of challenging evaluation benchmarks. Motivated by both demands, we introduce Cross-lingual Choice of Plausible Altern… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
52
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

3
7

Authors

Journals

citations
Cited by 72 publications
(53 citation statements)
references
References 61 publications
1
52
0
Order By: Relevance
“…Another multilingual data set, PAWS-X (Yang et al 2019), focused on the paraphrase identification task and was created translating the original English PAWS (Zhang, Baldridge, and He 2019) into 6 languages. XCOPA (Ponti et al 2020) is a crosslingual data set for the evaluation of crosslingual causal commonsense reasoning, obtained through translation of the English COPA data (Roemmele, Bejan, and Gordon 2011) to 11 target languages. A large number of tasks have been recently integrated into unified multilingual evaluation suites: XTREME (Hu et al 2020) and XGLUE (Liang et al 2020).…”
Section: Previous Work and Evaluation Datamentioning
confidence: 99%
“…Another multilingual data set, PAWS-X (Yang et al 2019), focused on the paraphrase identification task and was created translating the original English PAWS (Zhang, Baldridge, and He 2019) into 6 languages. XCOPA (Ponti et al 2020) is a crosslingual data set for the evaluation of crosslingual causal commonsense reasoning, obtained through translation of the English COPA data (Roemmele, Bejan, and Gordon 2011) to 11 target languages. A large number of tasks have been recently integrated into unified multilingual evaluation suites: XTREME (Hu et al 2020) and XGLUE (Liang et al 2020).…”
Section: Previous Work and Evaluation Datamentioning
confidence: 99%
“…As a result, many recent benchmarking and dataset creation efforts in NLU develop and focus on tasks that are inherently multilingual or which explore cross-lingual transfer. For example, XTREME (Hu et al, 2020) introduces a benchmark covering 40 languages across multiple NLU and retrieval tasks, XCOPA (Ponti et al, 2020) is a commonsense reasoning dataset for eleven languages, and MLQA ) is a dataset for extractive question answering across seven languages. We can observe a similar recent trend in natural language generation, where ML-Sum and WikiLingua were created as multilingual summarization datasets.…”
Section: Increasing Multilingualism Of Nlg Researchmentioning
confidence: 99%
“…Commonsense datasets in multiple languages or languages other than English have also been created recently. XCOPA (Ponti et al, 2020) In the aspect of commonsense reasoning, our dataset is different from the mentioned commonsense datasets in that we detect and annotate errors in machine-generated texts, which violates common sense, rather than creating examples to examine the commonsense reasoning ability of machines.…”
Section: Commonsense Datasetsmentioning
confidence: 99%