Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries 2015
DOI: 10.1145/2756406.2756912
|View full text |Cite
|
Sign up to set email alerts
|

How Well Are Arabic Websites Archived?

Abstract: It is has long been anecdotally known that web archives and search engines favor Western and English-language sites. In this paper we quantitatively explore how well indexed and archived are Arabic language web sites. We began by sampling 15,092 unique URIs from three different website directories: DMOZ (multi-lingual), Raddadi and Star28 (both primarily Arabic language). Using language identification tools we eliminated pages not in the Arabic language (e.g., English language versions of Al-Jazeera sites) and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…In previous works on web dynamics, suitable datasets had to be crawled first, which is tedious and can only be done for shorter periods (Cho and Garcia-Molina, 2000;Fetterly et al, 2003;Koehler, 2002;Adar et al, 2009). With access to existing archives, more recent studies of the Web were conducted retrospectively on available data (Hale et al, 2014;Agata et al, 2014;Alkwai et al, 2015), commonly with a focus on a particular subset, such as national domains or topical subsets. These kinds of works are typical data-centric tasks as they require access to archived raw data or metadata records.…”
Section: Metadata Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…In previous works on web dynamics, suitable datasets had to be crawled first, which is tedious and can only be done for shorter periods (Cho and Garcia-Molina, 2000;Fetterly et al, 2003;Koehler, 2002;Adar et al, 2009). With access to existing archives, more recent studies of the Web were conducted retrospectively on available data (Hale et al, 2014;Agata et al, 2014;Alkwai et al, 2015), commonly with a focus on a particular subset, such as national domains or topical subsets. These kinds of works are typical data-centric tasks as they require access to archived raw data or metadata records.…”
Section: Metadata Analysismentioning
confidence: 99%
“…Ainsworth et al (Ainsworth et al, 2011) investigated how much of the web was archived and estimated that 35 -90% of existing web resources have at least one Memento. Alkwai et al estimated the archive coverage of Arabic websites (Alkwai et al, 2015), and later conducted an additional study (Alkwai et al, 2017) to compare the archiving rates of English-, Arabic-, Danishand Korean-language web pages. Alkwai showed that English has a higher archiving rate than Arabic, which in turn has a higher archiving rate than Danish or Korean.…”
Section: Age and Availability Of Resourcesmentioning
confidence: 99%
“…They reported that about 15% to 31% (depending on the sample) URIs are archived at least once per month. Alkwai et al revisited the archival rate question in 2015, but for web pages of specific languages [17], [18]. They collected over 15,000 URI samples from English, Arabic, Danish, and Korean languages to find out how much of the pages from each of these languages are archived.…”
Section: Related Workmentioning
confidence: 99%
“…With access to existing Web archives, more recent studies of the Web were conducted retrospectively on available data [29,30,31]. However, instead of analyzing the whole archive at once, all of them focus on a particular subset, such as national domains.…”
Section: Web Dynamics Analysismentioning
confidence: 99%