Findings of the Association for Computational Linguistics: EMNLP 2022 2022
DOI: 10.18653/v1/2022.findings-emnlp.273
|View full text |Cite
|
Sign up to set email alerts
|

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 0 publications
0
1
0
Order By: Relevance
“…Despite the plethora of works on multilingual prompting, little to no African languages are usually contained in the evaluation sets of nearly all of these works. When present, they are often obtained by translating the existing datasets of other languages (Yu et al, 2022) This method has been shown to contain artifacts that can inflate the performance of models evaluated on such datasets (Artetxe et al, 2020). Ahuja et al ( 2023) performs a comprehensive evaluation of the GPT models on standard NLP benchmarks, covering 16 NLP datasets across 70 typologically diverse languages.…”
Section: Related Workmentioning
confidence: 99%
“…Despite the plethora of works on multilingual prompting, little to no African languages are usually contained in the evaluation sets of nearly all of these works. When present, they are often obtained by translating the existing datasets of other languages (Yu et al, 2022) This method has been shown to contain artifacts that can inflate the performance of models evaluated on such datasets (Artetxe et al, 2020). Ahuja et al ( 2023) performs a comprehensive evaluation of the GPT models on standard NLP benchmarks, covering 16 NLP datasets across 70 typologically diverse languages.…”
Section: Related Workmentioning
confidence: 99%
“…The study of bilingualism has long been a topic of interest among linguists (Yu et al, 2022;Hoffmann, 2014), as it provides insight into the mechanisms of language acquisition and processing. Furthermore, research on multilingualism has contributed to the development of more effective machine learning models, such as neural translation systems (Zou et al, 2013).…”
Section: Introductionmentioning
confidence: 99%
“…Advances in multilingual natural language processing (NLP) technologies (Dabre et al, 2020;Hedderich et al, 2021) have raised the enticing possibilities of NLP systems that benefit all people around the world. However, at the same time, studies into the state of multilingual NLP have demonstrated stark differences in the amount of resources available (Joshi et al, 2020;Yu et al, 2022) and performance of existing NLP systems (Blasi et al, 2022;Khanuja et al, 2023;Ahia et al, 2023).…”
Section: Introductionmentioning
confidence: 99%