2012 IEEE 24th International Conference on Tools With Artificial Intelligence 2012
DOI: 10.1109/ictai.2012.75
|View full text |Cite
|
Sign up to set email alerts
|

An Agent-Based Focused Crawling Framework for Topic- and Genre-Related Web Document Discovery

Abstract: Abstract-The discovery of web documents about certain topics is an important task for web-based applications including web document retrieval, opinion mining and knowledge extraction. In this paper, we propose an agent-based focused crawling framework able to retrieve topic-and genre-related web documents. Starting from a simple topic query, a set of focused crawler agents explore in parallel topic-specific web paths using dynamic seed URLs that belong to certain web genres and are collected from web search en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…The measurement of execution time that is calculated as the time passed from starting of execution until the agents reach the predefined threshold of crawled webpages. 162 If there is no way to measure the actual number of webpages available, the total number of webpages collected by each crawler is used as metric. 172 Precision (Relevance) is judged by the human inspection which is biased and inconsistent.…”
Section: Performance Metrics For Focused Web Crawlermentioning
confidence: 99%
“…The measurement of execution time that is calculated as the time passed from starting of execution until the agents reach the predefined threshold of crawled webpages. 162 If there is no way to measure the actual number of webpages available, the total number of webpages collected by each crawler is used as metric. 172 Precision (Relevance) is judged by the human inspection which is biased and inconsistent.…”
Section: Performance Metrics For Focused Web Crawlermentioning
confidence: 99%
“…Some of the other work on seed URL extraction and topic mapping are (i) Pappas et al [19] identified topics using dynamic seed URLs and evaluated topic relevance. In this work, the identification of seed URLs is manual and does not confirm representation of all subtopics of a topic.…”
Section: Literature Surveymentioning
confidence: 99%
“…The idea is that, given a query, up-to-date relevant documents can be retrieved from various domains and web-genres by following the path of a focused crawler, but also in a real-time manner. For the purposes of our system, [13] is especially suitable. It is an agent-based focused crawling framework that is able to retrieve topic-and genre-related web documents in an automated and real-time manner.…”
Section: Discovery Of Topic-related Web Documentsmentioning
confidence: 99%
“…The Linkscore T and Linkscore G are relevance scores based on topic and genre accordingly; and they are computed by using link analysis techniques (see [13]). …”
Section: Discovery Of Topic-related Web Documentsmentioning
confidence: 99%