2021
DOI: 10.1162/tacl_a_00419
|View full text |Cite
|
Sign up to set email alerts
|

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Abstract: Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce Par… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 53 publications
0
6
0
Order By: Relevance
“…In order to study if ARMAN works well as a language model, we tested our models in Natural Language Understanding (NLU) tasks. According to Khashabi et al (2020), we selected multiple-choice question-answering, textual entailment, sentiment analysis, and question paraphrasing tasks to examine our models' performance on them. For more information about these tasks and datasets, see Appendix A and Khashabi et al (2020).…”
Section: Nlu Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to study if ARMAN works well as a language model, we tested our models in Natural Language Understanding (NLU) tasks. According to Khashabi et al (2020), we selected multiple-choice question-answering, textual entailment, sentiment analysis, and question paraphrasing tasks to examine our models' performance on them. For more information about these tasks and datasets, see Appendix A and Khashabi et al (2020).…”
Section: Nlu Resultsmentioning
confidence: 99%
“…ParsiNLU (Khashabi et al, 2020) is a collection of NLU tasks for the Persian language including Textual Entailment, Sentiment Analysis, Question Paraphrasing, Multiple Choice Question Answering, and Reading Comprehension tasks. We have fine-tuned our models on most of them to test their performances on NLU tasks.…”
Section: Downstream Datasetsmentioning
confidence: 99%
“…Hence, in this paper, we create a native QA dataset for the Persian language. Khashabi et al [34] created a Persian QA dataset containing 1300 instances and trained a QA system using this dataset. To the best of our knowledge, currently, there is no native larg-scale QA dataset for answering the Persian questions, neither as a monolingual nor as a cross-lingual dataset.…”
Section: B Other Languagesmentioning
confidence: 99%
“…[7] 2018 English Native 150K+ Wikiqa: A challenge dataset for open-domain question answering [8] 2015 English Native 3K+ MS MARCO: A human generated machine reading comprehension dataset [9] 2016 English Native 100K+ Natural questions: a benchmark for question answering research [10] 2019 English Native 300K+ Quac: Question answering in context [11] 2018 English Native 100K+ Coqa: A conversational question answering challenge [12] 2019 English Native 127K+ Newsqa: A machine comprehension dataset [13] 2016 English Native 100K+ Constructing datasets for multi-hop reading comprehension across documents [15] 2018 English Native, Multi-hop 50K+ Hotpotqa: A dataset for diverse, explainable multi-hop question answering [16] 2018 English Native, Multi-hop 113K+ Repartitioning of the complexwebquestions dataset [17] 2018 English Native, Multi-hop 63K+ R4C: A benchmark for evaluating RC systems to get the right answer for the right reason [18] 2019 English Native, Multi-hop 4K+ Automatic spanish translation of the squad dataset for multilingual question answering [19] 2019 Spanish Translation 100K+ Neural arabic question answering [20] 2019 Arabic Translation 48K+ Semi-supervised training data generation for multilingual question answering [21] 2018 Korean Translation 81K+ Neural learning for question answering in italian [22] 2018 Italian Translation 60K+ SberQuAD-Russian reading comprehension dataset: Description and analysis [24] 2020 Russian Native 50K+ Drcd: a chinese machine reading comprehension dataset [25] 2018 Chinese Native 30K+ Korquad1. 0: Korean qa dataset for machine reading comprehension [26] 2018 Korean Native 70K+ Project PIAF: Building a Native French Question-Answering Dataset [27] 2020 French Native 3K+ Parsinlu: a suite of language understanding challenges for persian [34] 2021 Persian Native 1K+ ParSQuAD: Persian Question Answering Dataset based on Machine Translation of SQuAD 2.0 [33] 2021 Persian Translation 25K, 70K…”
Section: B Other Languagesmentioning
confidence: 99%
See 1 more Smart Citation