Several data protection laws include special provisions for protecting personal data relating to religion, health, sexual orientation, and other sensitive categories. Having a well-dened list of sensitive categories is sucient for ling complaints manually, conducting investigations, and prosecuting cases in courts of law. Data protection laws, however, do not dene explicitly what type of content falls under each sensitive category. Therefore, it is unclear how to implement proactive measures such as informing users, blocking trackers, and ling complaints automatically when users visit sensitive domains. To empower such use cases we turn to the Curlie.org crowdsourced taxonomy project for drawing training data to build a text classier for sensitive URLs. We demonstrate that our classier can identify sensitive URLs with accuracy above 88%, and even recognize specic sensitive categories with accuracy above 90%. We then use our classier to search for sensitive URLs in a corpus of 1 Billion URLs collected by the Common Crawl project. We identify more than 155 millions sensitive URLs in more than 4 million domains. Despite their sensitive nature, more than 30% of these URLs belong to domains that fail to use HTTPS. Also, in sensitive web pages with third-party cookies, 87% of the third-parties set at least one persistent cookie. CCS CONCEPTS• Security and privacy → Privacy protections; • Information systems → World Wide Web; • Networks → Network measurement.
No abstract
In recent years, governments worldwide have moved their services online to better serve their citizens. Benefits aside, this choice increases the danger of tracking via such sites. This is of great concern as governmental websites increasingly become the only interaction point with the government. In this paper, we investigate popular governmental websites across different countries and assess to what extent the visits to these sites are tracked by third-parties. Our results show that, unfortunately, tracking is a serious concern, as in some countries up to 90% of these websites create cookies of third-party trackers without any consent from users. Non-session cookies, that are created by trackers and can last for days or months, are widely present even in countries with strict user privacy laws. We also show that the above is a problem for official websites of international organizations and popular websites that inform the public about the COVID-19 pandemic. CCS CONCEPTS• Information systems → World Wide Web; • Security and privacy → Human and societal aspects of security and privacy.
Bridges are onion routers in the Tor Network whose IP addresses are not public. So far, no global security analysis of Tor bridges has been performed. Leveraging public data sources, and two known Tor issues, we perform the first systematic study on the security of the Tor bridges infrastructure. Our study covers both the public infrastructure available to all Tor users, and the previously unreported private infrastructure, comprising private nodes for the exclusive use of those who know their existence. Our analysis of the public infrastructure is twofold. First, we examine the security implications of the public data in the CollecTor service, identifying several pieces of data that may be detrimental for the security of bridges. Then, we measure security relevant properties of public bridges. Our results show that the 55% of public bridges that carry clients are vulnerable to aggressive blocking; that 90% of bridge clients use default bridges that are trivial to identify; that the concurrent deployment of Pluggable Transports in bridges reduces the security of the most secure transports; and that running non-Tor services in the same host as a bridge may harm its anonymity. To study the private infrastructure, we use an approach to discover 694 private bridges on the Internet and a novel technique to track bridges across IP changes. We are first to measure the size of the private bridge population (35% discovered bridges are private) and to report existence of infrastructures that use private proxies to forward traffic to backend bridges or relays. We use a novel clustering approach to analyze the different infrastructures using proxies and bridges, examining its hosting and security properties. We provide an extensive discussion on the security implications of our findings. Permission to freely reproduce all or part of this paper for noncommercial purposes is granted provided that copies bear this notice and the full citation on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Internet Society, the first-named author (for reproduction of an entire paper only), and the author's employer if the paper was prepared within the scope of employment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.