2005
DOI: 10.1007/s10791-005-6993-5
|View full text |Cite
|
Sign up to set email alerts
|

A General Evaluation Framework for Topical Crawlers

Abstract: Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. As the Web mining field matures, the disparate crawling strategies proposed in the literature will have to be evaluated and compared on common tasks through well-defined performance measures. This paper presents a general framework to evaluate topical crawlers. We identify a class of tasks that model crawling applications of different nature and difficulty. We t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
64
0

Year Published

2007
2007
2018
2018

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 88 publications
(65 citation statements)
references
References 32 publications
(81 reference statements)
1
64
0
Order By: Relevance
“…The topic can consist of one or more seeds or even a short query. Preferential crawlers that start with only such information are often called topical crawlers [7,11,16]. They do not have text [3] classifiers to guide crawling.…”
Section: Fig 12 Parsed Tree Of Html Codementioning
confidence: 99%
See 1 more Smart Citation
“…The topic can consist of one or more seeds or even a short query. Preferential crawlers that start with only such information are often called topical crawlers [7,11,16]. They do not have text [3] classifiers to guide crawling.…”
Section: Fig 12 Parsed Tree Of Html Codementioning
confidence: 99%
“…They also use naïve Bayesian classifiers as a guide, but in this case the classifiers are trained to estimate the link distance between a crawled page and a set of relevant target pages [18]. A topical crawler [7,11,16], a MySpiders applet, is designed to demonstrate two topical crawling algorithms [1,7,11,16], best-N-first and InfoSpiders [10]. It is interactive in that a user submits a query and the Web is crawled in real time.…”
Section: Fig 12 Parsed Tree Of Html Codementioning
confidence: 99%
“…The study of the topology of the topical graphs in the Web has consequences on focused crawling and free exploration of Web pages or exploitation of already discovered resources [5]. Interesting works attempt to formulate the numerous variants involved in the design of this kind of system and to propose suited evaluation frameworks and metrics to research teams working in the field [6] [7].…”
Section: A Focused Crawling and Artificial Lifementioning
confidence: 99%
“…This method can make the formulization of a topic easier and intuitive, but it cannot express the semantic information of keywords. The second method uses an existing classified catalogue (such as Open Directory Project and Yahoo Directory) to define a given topic (Pant and Menczer, 2003;Srinivasan et al, 2005). The method could describe not only detailed information on topic itself but also some semantic information.…”
Section: Introductionmentioning
confidence: 99%