Findings of the Association for Computational Linguistics: ACL 2022 2022
DOI: 10.18653/v1/2022.findings-acl.44
|View full text |Cite
|
Sign up to set email alerts
|

Toward More Meaningful Resources for Lower-resourced Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Digital language divide refers to the gap between languages with and without a considerable representation within the worldwide digital infrastructure. As shown by Kornai (2013) about 10 years ago, less than 5% of the world's 7-8000 languages have a remotely significant representation on the Internet, and despite the progresses of a decade, the gap has barely shrunk (Joshi et al, 2020). The political dimension of the divide is most evident when reconstructing the argument of size, which of course matters in the rapid upscaling of digital support for certain languages, but is at once a result of imperialist politics and far from the only determining factor in digital support.…”
Section: Introductionmentioning
confidence: 84%
See 2 more Smart Citations
“…Digital language divide refers to the gap between languages with and without a considerable representation within the worldwide digital infrastructure. As shown by Kornai (2013) about 10 years ago, less than 5% of the world's 7-8000 languages have a remotely significant representation on the Internet, and despite the progresses of a decade, the gap has barely shrunk (Joshi et al, 2020). The political dimension of the divide is most evident when reconstructing the argument of size, which of course matters in the rapid upscaling of digital support for certain languages, but is at once a result of imperialist politics and far from the only determining factor in digital support.…”
Section: Introductionmentioning
confidence: 84%
“…neural networks; and (3) it requires very little to no knowledge from experts or speakers of the languages targeted. The typical low-resource research contribution thus scrapes web content, such as Wikipedia pages, written in the languages in question, often without any understanding of their quality or content (Lignos et al, 2022). It then trains or fine-tunes deep learning models based on the data, and finally demonstrates a few percentages of increase in quality (precision, recall, BLEU, etc.)…”
Section: Methodological Causes Of Language Modeling Biasmentioning
confidence: 99%
See 1 more Smart Citation
“…(1) In the context of corpus generation, (Lignos et al, 2022) report that researchers are not always familiar with the corpora they are using. For instance, when Wikipedia is scraped automatically, the contents of pages can be of low quality due to the use of machine translation, or may not even correspond to the language by which they are tagged.…”
Section: Methodology As a Source Of Biasmentioning
confidence: 99%
“…Despite its undeniable results over hundreds of languages, the linguistics-unaware modus operandi of neural language research has been criticised from multiple perspectives. From a methodological point of view, due to an insufficient understanding of researchers about the corpora, the languages and, ultimately, the cultures being worked upon, major quality problems in research output remain hidden behind precision-recall figures and eventually go unnoticed by the scientific community (Lignos et al, 2022). Ethics-wise, the attitude of first-world experts who pretend to 'save the day' in the Global South by applying blanket solutions to languages with which they have no contact or understanding has been pointed out as fundamentally neocolonial (Bird, 2020;Schwartz, 2022).…”
Section: Introductionmentioning
confidence: 99%