Proceedings of the 10th ACM Conference on Web Science 2018
DOI: 10.1145/3201064.3201091
|View full text |Cite
|
Sign up to set email alerts
|

Automated Discovery of Internet Censorship by Web Crawling

Abstract: Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals and organisations that aim to empirical data on censorship for benefit of the public and wider censorship research community.We present a new approach for … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 21 publications
(24 reference statements)
0
2
0
Order By: Relevance
“…Censored domains -The second stage of the experiment we repeated against popular domains that we knew to be censored. We gathered these domains using the technique described by Darer et al [7] [8]. This list was also based on backlinks from web pages, similar to the Majestic Million, and so domains with more links to it from other sites were chosen over those domains that only had a few backlinks.…”
Section: Methodsmentioning
confidence: 99%
“…Censored domains -The second stage of the experiment we repeated against popular domains that we knew to be censored. We gathered these domains using the technique described by Darer et al [7] [8]. This list was also based on backlinks from web pages, similar to the Majestic Million, and so domains with more links to it from other sites were chosen over those domains that only had a few backlinks.…”
Section: Methodsmentioning
confidence: 99%
“…This approach aims to produce a continually-updated list of blocked terms that could be used to maintain an understanding of those terms most offensive to the filtering authorities. Similarly, Darer et al [9,10] have used keyword-and crawling-based approaches to discover previously unindentified blocked domains.…”
Section: Technical Analysismentioning
confidence: 99%