2007
DOI: 10.1016/j.is.2006.09.004
|View full text |Cite
|
Sign up to set email alerts
|

Combining text and link analysis for focused crawling—An application for vertical search engines

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0
1

Year Published

2008
2008
2021
2021

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 75 publications
(48 citation statements)
references
References 41 publications
0
47
0
1
Order By: Relevance
“…Customized search engines can be categorized into general-purpose and topic-specific search engines [29]. Customized general-purpose search engines are equipped with a web crawler, which can continually collect large numbers of webpages by starting from a series of seed URLs (a list of URLs that the crawler starts with) and without a specific topic [32,41,42].…”
Section: Surface Geospatial Web Services Discoverymentioning
confidence: 99%
“…Customized search engines can be categorized into general-purpose and topic-specific search engines [29]. Customized general-purpose search engines are equipped with a web crawler, which can continually collect large numbers of webpages by starting from a series of seed URLs (a list of URLs that the crawler starts with) and without a specific topic [32,41,42].…”
Section: Surface Geospatial Web Services Discoverymentioning
confidence: 99%
“…The main challenges in focused crawling relate to the prioritization of URLs not yet visited, which may be based on similarity measures [24,26], hyperlink distance-based limits [30,31], or combinations of text and hyperlink analysis with Latent Semantic Indexing (LSI) [32]. Machine learning approaches, including naïve Bayes classifiers [25,33], Hidden Markov Models [34], reinforcement learning [35], genetic algorithms [36], and neural networks [37], have also been applied to prioritize the unvisited URLs.…”
Section: Focused and Deep-web Crawlingmentioning
confidence: 99%
“…The availability of a huge volume of data through Internet has lead to an increasing interest in knowledge retrieval from large amount of hypertext (Almpanidis et al 2007). Two methods for extracting links among websites are available today: crawlers and SEs.…”
Section: Data Extractionmentioning
confidence: 99%