Proceedings of the 25th ACM Conference on Hypertext and Social Media 2014
DOI: 10.1145/2631775.2631795
|View full text |Cite
|
Sign up to set email alerts
|

Scalable, generic, and adaptive systems for focused crawling

Abstract: Focused crawling is the process of exploring a graph iteratively, focusing on parts of the graph relevant to a given topic. It occurs in many situations such as a company collecting data on competition, a journalist surfing the Web to investigate a political scandal, or an archivist recording the activity of influential Twitter users during a presidential election. In all these applications, users explore a graph (e.g., the Web or a social network), nodes are discovered one by one, the total number of explorat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 12 publications
(20 citation statements)
references
References 12 publications
0
20
0
Order By: Relevance
“…Then, a page from the host is taken using the online classifier. Similarly, Gouriten et al [12] use bandits to choose estimators for scoring the frontier.…”
Section: Related Workmentioning
confidence: 99%
“…Then, a page from the host is taken using the online classifier. Similarly, Gouriten et al [12] use bandits to choose estimators for scoring the frontier.…”
Section: Related Workmentioning
confidence: 99%
“…Since we want to find the reputation of an entity and our goal is the richness and relevance of the sample, we find these methods not suitable. The idea of focused crawling related to a specific topic, based on weights, has been used in [12] and expert sampling in [10]. The study provided by [6] underlines the importance of the retweets and of the mentions in judging about the influence of the users.…”
Section: Weighted Samplingmentioning
confidence: 99%
“…The most important question that we should pose is: "Do we went a statistically representative sample that aligns with the real large Twitter dataset or do we want a filtered sample that focuses on the relevant tweets?". Several papers have contributed to find a statistically representative sample [6,8], while others highlight focused on crawling ( [9]) or Expert Sampling ( [7]). Inspired by the latest work, we consider three main parameters that influence the quality of the tweet: the number of times the tweet is retweeted, the favorite count of the tweet, and the number of followers of the user that has tweeted.…”
Section: Weighted Samplingmentioning
confidence: 99%