2008
DOI: 10.1145/1402946.1402991
|View full text |Cite
|
Sign up to set email alerts
|

Unconstrained endpoint profiling (googling the internet)

Abstract: Understanding Internet access trends at a global scale, i.e., what do people do on the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the Internet endpoints is already available around us -on the web.In this paper, we introduce a novel approach fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 31 publications
(34 citation statements)
references
References 22 publications
0
34
0
Order By: Relevance
“…This is particularly important for applications such as P2P, which can be more challenging [7], especially at the backbone [4]. Figure 4 summarizes our results.…”
Section: B Experimental Resultsmentioning
confidence: 66%
See 1 more Smart Citation
“…This is particularly important for applications such as P2P, which can be more challenging [7], especially at the backbone [4]. Figure 4 summarizes our results.…”
Section: B Experimental Resultsmentioning
confidence: 66%
“…For example, in the case of classification conflicts, we could incorporate the classifier with the better accuracy or precision, especially since some methods are better for different applications. (c) We can query search engines for IP addresses that we want to classify, in essence, "Googling the Internet" [7], thus harnessing the power of the Web. (d) We can use active and passive measurement techniques for the seeding process.…”
Section: Discussionmentioning
confidence: 99%
“…-Whereas, BLINC [18] decides a label for a source (IP address, source port) based on its connection pattern. Also, a recent method [19] labels directly hosts without any traffic information by collecting and analyzing information freely available on the web. Thus, researchers faced difficulties in comparing events standing for flows with events representing hosts.…”
Section: Granularity Of Eventsmentioning
confidence: 99%
“…The assumption is that if a host is an SMTP server, all the flows generated from this host toward port 25 are mail traffic. In general, this heuristic is applicable for P2P traffic as long as the information about the port number can be utilized 8 and the assumption of the heuristic can be validated. In our experience, for example, there is a large number of eDonkey flows which can be identified using port 4662 and for BitTorrent on port 6881.…”
Section: The Verification Process (In Iterations)mentioning
confidence: 99%
“…On the technical aspects, our work can be seen as a cumulative progress, with lots of inspirations from previous traffic classification works, including [2,3,[6][7][8][9].…”
Section: Related Workmentioning
confidence: 99%