Abstract:Recent studies have shown that Tor onion (hidden) service websites are particularly vulnerable to website fingerprinting attacks due to their limited number and sensitive nature. In this work we present a multi-level feature analysis of onion site fingerprintability, considering three state-of-the-art website fingerprinting methods and 482 Tor onion services, making this the largest analysis of this kind completed on onion services to date.Prior studies typically report average performance results for a given … Show more
“…Crawls are a standard way of collecting data for Web measurements. Many crawl-based studies focus on security and privacy issues, such as online tracking [2,17], detecting the use of browser fingerprinting [3,26,40], detecting the use of sensitive APIs [13], the tracking ecosystem of internet-enabled devices [34], dark patterns on shopping websites [28], content security violations [55], the effectiveness of tracker blockers [30], GDPR compliance [8], and many others [43,57]. The OpenWPM framework has been used in over 47 such crawl-based studies [59].…”
Large-scale Web crawls have emerged as the state of the art for studying characteristics of the Web. In particular, they are a core tool for online tracking research. Web crawling is an attractive approach to data collection, as crawls can be run at relatively low infrastructure cost and don't require handling sensitive user data such as browsing histories. However, the biases introduced by using crawls as a proxy for human browsing data have not been well studied. Crawls may fail to capture the diversity of user environments, and the snapshot view of the Web presented by one-time crawls does not reflect its constantly evolving nature, which hinders reproducibility of crawl-based studies. In this paper, we quantify the repeatability and representativeness of Web crawls in terms of common tracking and fingerprinting metrics, considering both variation across crawls and divergence from human browser usage. We quantify baseline variation of simultaneous crawls, then isolate the effects of time, cloud IP address vs. residential, and operating system. This provides a foundation to assess the agreement between crawls visiting a standard list of high-traffic websites and actual browsing behaviour measured from an opt-in sample of over 50,000 users of the Firefox Web browser. Our analysis reveals differences between the treatment of stateless crawling infrastructure and generally stateful human browsing, showing, for example, that crawlers tend to experience higher rates of third-party activity than human browser users on loading pages from the same domains.
“…Crawls are a standard way of collecting data for Web measurements. Many crawl-based studies focus on security and privacy issues, such as online tracking [2,17], detecting the use of browser fingerprinting [3,26,40], detecting the use of sensitive APIs [13], the tracking ecosystem of internet-enabled devices [34], dark patterns on shopping websites [28], content security violations [55], the effectiveness of tracker blockers [30], GDPR compliance [8], and many others [43,57]. The OpenWPM framework has been used in over 47 such crawl-based studies [59].…”
Large-scale Web crawls have emerged as the state of the art for studying characteristics of the Web. In particular, they are a core tool for online tracking research. Web crawling is an attractive approach to data collection, as crawls can be run at relatively low infrastructure cost and don't require handling sensitive user data such as browsing histories. However, the biases introduced by using crawls as a proxy for human browsing data have not been well studied. Crawls may fail to capture the diversity of user environments, and the snapshot view of the Web presented by one-time crawls does not reflect its constantly evolving nature, which hinders reproducibility of crawl-based studies. In this paper, we quantify the repeatability and representativeness of Web crawls in terms of common tracking and fingerprinting metrics, considering both variation across crawls and divergence from human browser usage. We quantify baseline variation of simultaneous crawls, then isolate the effects of time, cloud IP address vs. residential, and operating system. This provides a foundation to assess the agreement between crawls visiting a standard list of high-traffic websites and actual browsing behaviour measured from an opt-in sample of over 50,000 users of the Firefox Web browser. Our analysis reveals differences between the treatment of stateless crawling infrastructure and generally stateful human browsing, showing, for example, that crawlers tend to experience higher rates of third-party activity than human browser users on loading pages from the same domains.
“…It makes the generated flow indistinguishable from the target flow, so as to achieve the purpose of privacy protection and review avoidance. In addition, GAN is developing to a more dynamic adaptive flow camouflage direction by combining with other supervised learning methods [36][37][38][39].…”
Section: Gan and Its Application In Trafficmentioning
In the intelligent era of human-computer symbiosis, the use of machine learning method for covert communication confrontation has become a hot topic of network security. The existing covert communication technology focuses on the statistical abnormality of traffic behavior and does not consider the sensory abnormality of security censors, so it faces the core problem of lack of cognitive ability. In order to further improve the concealment of communication, a game method of “cognitive deception” is proposed, which is aimed at eliminating the anomaly of traffic in both behavioral and cognitive dimensions. Accordingly, a Wasserstein Generative Adversarial Network of Covert Channel (WCCGAN) model is established. The model uses the constraint sampling of cognitive priors to construct the constraint mechanism of “functional equivalence” and “cognitive equivalence” and is trained by a dynamic strategy updating learning algorithm. Among them, the generative module adopts joint expression learning which integrates network protocol knowledge to improve the expressiveness and discriminability of traffic cognitive features. The equivalent module guides the discriminant module to learn the pragmatic relevance features through the activity loss function of traffic and the application loss function of protocol for end-to-end training. The experimental results show that WCCGAN can directly synthesize traffic with comprehensive concealment ability, and its behavior concealment and cognitive deception are as high as 86.2% and 96.7%, respectively. Moreover, the model has good convergence and generalization ability and does not depend on specific assumptions and specific covert algorithms, which realizes a new paradigm of cognitive game in covert communication.
“…Research papers N. of papers Attacks to THS [40], [41], [42], [43], [44], [45], [47], [48], [50], [51], [52], [53], [54], [56], [57], [21], [58], [63], [49], [19] 20…”
Section: Focus Of the Researchmentioning
confidence: 99%
“…Another paper documenting a fingerprint attack [48] mentions that the larger a .onion site is, i.e. the more content it offers, the more susceptible it is to being tracked; while those that are more difficult to identify and tend to be small and dynamic.…”
Anonymous communications networks were born to protect the privacy of our communications, preventing censorship and traffic analysis. The most famous anonymous communication network is Tor. This anonymous communication network provides some interesting features, among them, we can mention user’s IP location or Tor Hidden Services (THS) as a mechanism to conceal the location of servers, mainly, web servers. THS is an important research field in Tor. However, there is a lack of reviews that sump up main findings and research challenges. In this article we present a systematic literature review that aims to offer a comprehensive view on the research made on Tor Hidden services presenting the state of the art and the different research challenges to be addressed. This review has been developed from a selection of 57 articles and present main findings and advances regarding Tor Hidden Services, limitations found, and future issues to be investigated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.