Abstract-Various methods have been proposed for creating and maintaining lists of potentially filtered URLs to allow for measurement of ongoing internet censorship around the world. Whilst testing a known resource for evidence of filtering can be relatively simple, given appropriate vantage points, discovering previously unknown filtered web resources remains an open challenge.We present a new framework for automating the process of discovering filtered resources through the use of adaptive queries to well-known search engines. Our system applies information retrieval algorithms to isolate characteristic linguistic patterns in known filtered web pages; these are then used as the basis for web search queries. The results of these queries are then checked for evidence of filtering, and newly discovered filtered resources are fed back into the system to detect further filtered content.Our implementation of this framework, applied to China as a case study, shows that this approach is demonstrably effective at detecting significant numbers of previously unknown filtered web pages, making a significant contribution to the ongoing detection of internet filtering as it develops.Our tool is currently deployed and has been used to discover 1355 domains that are poisoned within China as of Feb 2017-30 times more than are contained in the most widely-used public filter list. Of these, 759 are outside of the Alexa Top 1000 domains list, demonstrating the capability of this framework to find more obscure filtered content. Further, our initial analysis of filtered URLs, and the search terms that were used to discover them, gives further insight into the nature of the content currently being blocked in China.
Anecdotal evidence suggests an increasing number of people are turning to VPN services for the properties of privacy, anonymity and free communication over the internet. Despite this, there is little research into what these services are actually being used for. We use DNS cache snooping to determine what domains people are accessing through VPNs. This technique is used to discover whether certain queries have been made against a particular DNS server. Some VPNs operate their own DNS servers, ensuring that any cached queries were made by users of the VPN. We explore 3 methods of DNS cache snooping and briefly discuss their strengths and limitations. Using the most reliable of the methods, we perform a DNS cache snooping scan against the DNS servers of several major VPN providers. With this we discover which domains are actually accessed through VPNs. We run this technique against popular domains, as well as those known to be censored in certain countries; China, Indonesia, Iran, and Turkey. Our work gives a glimpse into what users use VPNs for, and provides a technique for discovering the frequency with which domain records are accessed on a DNS server.
Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals and organisations that aim to empirical data on censorship for benefit of the public and wider censorship research community.We present a new approach for discovering filtered domains in different countries. This method is fully automated and requires no human interaction. The system uses web crawling techniques to traverse between filtered sites and implements a robust method for determining if a domain is filtered. We demonstrate the effectiveness of the approach by running experiments to search for filtered content in four different censorship regimes. Our results show that we perform better than the current state of the art and have built domain filter lists an order of magnitude larger than the most widely available public lists as of Jan 2018. Further, we build a dataset mapping the interlinking nature of blocked content between domains and exhibit the tightly networked nature of censored web resources. CCS CONCEPTS• Social and professional topics → Censorship; • Security and privacy → Social aspects of security and privacy; • Networks → Naming and addressing;
We develop a means to detect ongoing per-country anomalies in the daily usage metrics of the Tor anonymous communication network, and demonstrate the applicability of this technique to identifying likely periods of internet censorship and related events. The presented approach identifies contiguous anomalous periods, rather than daily spikes or drops, and allows anomalies to be ranked according to deviation from expected behaviour.The developed method is implemented as a running tool, with outputs published daily by mailing list. This list highlights percountry anomalous Tor usage, and produces a daily ranking of countries according to the level of detected anomalous behaviour. This list has been active since August 2016, and is in use by a number of individuals, academics, and NGOs as an early warning system for potential censorship events.We focus on Tor, however the presented approach is more generally applicable to usage data of other services, both individually and in combination. We demonstrate that combining multiple data sources allows more specific identification of likely Tor blocking events. We demonstrate the our approach in comparison to existing anomaly detection tools, and against both known historical internet censorship events and synthetic datasets. Finally, we detail a number of significant recent anomalous events and behaviours identified by our tool. CCS CONCEPTS• Networks → Network measurement; • Social and professional topics → Technology and censorship; • Security and privacy → Pseudonymity, anonymity and untraceability;
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.