Content filtering technologies are often used for Internet censorship, but even as these technologies have become cheaper and easier to deploy, the censorship measurement community lacks a systematic approach to monitor their proliferation. Past research has focused on a handful of specific filtering technologies, each of which required cumbersome manual detective work to identify. Researchers and policymakers require a more comprehensive picture of the state and evolution of censorship based on content filtering in order to establish effective policies that protect Internet freedom. In this work, we present FilterMap, a novel framework that can scalably monitor content filtering technologies based on their blockpages. FilterMap first compiles in-network and new remote censorship measurement techniques to gather blockpages from filter deployments. We then show how the observed blockpages can be clustered, generating signatures for longitudinal tracking. FilterMap outputs a map of regions of address space in which the same blockpages appear (corresponding to filter deployments), and each unique blockpage is manually verified to avoid false positives. By collecting and analyzing more than 379 million measurements from 45,000 vantage points against more than 18,000 sensitive test domains, we are able to identify filter deployments associated with 90 vendors and actors and observe filtering in 103 countries. We detect the use of commercial filtering technologies for censorship in 36 out of 48 countries labeled as 'Not Free' or 'Partly Free' by the Freedom House "Freedom on the Net" report [26]. The unrestricted transfer of content filtering technologies have led to high availability, low cost, and highly effective filtering techniques becoming easier to deploy and harder to circumvent. Identifying these filtering deployments highlights policy and corporate social responsibility issues, and adds accountability to filter manufacturers. Our continued publication of FilterMap data will help the international community track the scope, scale and evolution of content-based censorship.
This study offers a first step toward understanding the extent to which we may be able to predict cyber security incidents (which can be of one of many types) by applying machine learning techniques and using externally observed malicious activities associated with network entities, including spamming, phishing, and scanning, each of which may or may not have direct bearing on a specific attack mechanism or incident type. Our hypothesis is that when viewed collectively, malicious activities originating from a network are indicative of the general cleanness of a network and how well it is run, and that furthermore, collectively they exhibit fairly stable and thus predictive behavior over time. To test this hypothesis, we utilize two datasets in this study: (1) a collection of commonly used IP address-based/host reputation blacklists (RBLs) collected over more than a year, and (2) a set of security incident reports collected over roughly the same period. Specifically, we first aggregate the RBL data at a prefix level and then introduce a set of features that capture the dynamics of this aggregated temporal process. A comparison between the distribution of these feature values taken from the incident dataset and from the general population of prefixes shows distinct differences, suggesting their value in distinguishing between the two while also highlighting the importance of capturing dynamic behavior (second order statistics) in the malicious activities. These features are then used to train a support vector machine (SVM) for prediction. Our preliminary results show that we can achieve reasonably good prediction performance over a forecasting window of a few months.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.