Proceedings of the the Seventh Arabic Natural Language Processing Workshop (WANLP) 2022
DOI: 10.18653/v1/2022.wanlp-1.34
|View full text |Cite
|
Sign up to set email alerts
|

Learning From Arabic Corpora But Not Always From Arabic Speakers: A Case Study of the Arabic Wikipedia Editions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
7
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(8 citation statements)
references
References 0 publications
1
7
0
Order By: Relevance
“…We study this problem from two perspectives: template-translated corpora and bot-generated corpora. For the template-translated corpora, Alshahrani et al (2022) have studied the Arabic Wikipedia editions and shown that more than one million articles in the Egyptian Arabic Wikipedia have been directly translated using simple templates that lack rich content from the English language with the help of the off-the-shelf translation tools like Google Translate. These translation tools generally perform well, but not perfectly, and have several serious problems, such as gender bias, that could adversely affect the translated content (Prates et al, 2020;Ullmann and Saunders, 2021;Lopez-Medel, 2021).…”
Section: Problem Of Unrepresentative Corporamentioning
confidence: 99%
See 4 more Smart Citations
“…We study this problem from two perspectives: template-translated corpora and bot-generated corpora. For the template-translated corpora, Alshahrani et al (2022) have studied the Arabic Wikipedia editions and shown that more than one million articles in the Egyptian Arabic Wikipedia have been directly translated using simple templates that lack rich content from the English language with the help of the off-the-shelf translation tools like Google Translate. These translation tools generally perform well, but not perfectly, and have several serious problems, such as gender bias, that could adversely affect the translated content (Prates et al, 2020;Ullmann and Saunders, 2021;Lopez-Medel, 2021).…”
Section: Problem Of Unrepresentative Corporamentioning
confidence: 99%
“…For the bot-generated corpora, a few recent research have shed light on the bots' activities on the Wikipedia project and their possible negative impacts on the quality of Wikipedia corpora (Tsvetkova et al, 2017;Zheng et al, 2019;Alshahrani et al, 2023). The root problem with the bots is that they can rapidly create Wikipedia articles (content pages) or edit the contents of those articles without any humans in the loop (Adler et al, 2008;Kang et al, 2021;Alshahrani et al, 2022). In this paper, we quantify the bots' activities in all Wikipedia editions and study the Arabic Wikipedia editions closely, specifically activities on their articles.…”
Section: Problem Of Unrepresentative Corporamentioning
confidence: 99%
See 3 more Smart Citations