Proceedings 2019 Network and Distributed System Security Symposium 2019
DOI: 10.14722/ndss.2019.23386
|View full text |Cite
|
Sign up to set email alerts
|

Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation

Abstract: In order to evaluate the prevalence of security and privacy practices on a representative sample of the Web, researchers rely on website popularity rankings such as the Alexa list. While the validity and representativeness of these rankings are rarely questioned, our findings show the contrary: we show for four main rankings how their inherent properties (similarity, stability, representativeness, responsiveness and benignness) affect their composition and therefore potentially skew the conclusions made in stu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
254
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 356 publications
(254 citation statements)
references
References 32 publications
0
254
0
Order By: Relevance
“…The advantage of our collaboration with law enforcement is that we can use their manual classification of benign and malicious domains from the takedown as a trustworthy source of ground truth. Previous studies mostly rely on publicly available blacklists and whitelists as the labeled ground truth [89], but malware blacklists have been found to contain benign parked or sinkholed domains and are ineffective at fully covering domains of several malware families [54], while lists of popular domains commonly used as whitelists can easily be manipulated by malware providers [56].…”
Section: Ground Truth Datamentioning
confidence: 99%
See 1 more Smart Citation
“…The advantage of our collaboration with law enforcement is that we can use their manual classification of benign and malicious domains from the takedown as a trustworthy source of ground truth. Previous studies mostly rely on publicly available blacklists and whitelists as the labeled ground truth [89], but malware blacklists have been found to contain benign parked or sinkholed domains and are ineffective at fully covering domains of several malware families [54], while lists of popular domains commonly used as whitelists can easily be manipulated by malware providers [56].…”
Section: Ground Truth Datamentioning
confidence: 99%
“…Malicious behavior can be detected and publicly shared in blacklists [54], [81], [101]. Commercial providers publish lists of the most popular websites that become base sets of seemingly benign domains [56]. The service may be crawled to populate search engine results or archive web content [37]: the latter enables longitudinal analyses of malicious activity [12], [83], [101].…”
Section: L3 Dns Configurationmentioning
confidence: 99%
“…We approximate this number based on the result from a prior study showing that most Internet users visit an average of 89 domains per month [49]. We sample the domains from the top list of one million popular websites obtained from the Tranco project on December 12th, 2019 [28]. To simulate a real-world scenario, for each user profile, we select the domains based on their popularity ranking instead of randomly picking them from the top list.…”
Section: A Domain Name Datasetmentioning
confidence: 99%
“…To examine the impact of browser caches with respect to the current state of the web, an extensive analysis is carried out. The top 500 URLs of Tranco [20] retrieved on August 19, 2019 serves as data basis. It results from a combination of well-known website ranking lists like Alexa, Cisco Umbrella, Majestic and Quantcast which has been prepared as a reliable and manipulation-free database for security and privacy research.…”
Section: Browser Cache Experimentsmentioning
confidence: 99%
“…In order to quantify this danger, the records are analysed to find cached resources that are shared among several websites. We again use the top 500 websites by Tranco [20] and capture the HTTP traffic with activated browser cache. The logs are analysed as follows.…”
Section: Current Security and Privacy Risk Assessmentmentioning
confidence: 99%