1994
DOI: 10.1016/0169-7552(94)90132-5
|View full text |Cite
|
Sign up to set email alerts
|

Information retrieval in the World-Wide Web: Making client-based searching feasible

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
61
0
2

Year Published

1998
1998
2014
2014

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 154 publications
(64 citation statements)
references
References 1 publication
1
61
0
2
Order By: Relevance
“…Crawlers and agents have grown more sophisticated [11]. To our knowledge the earliest example of using a query to direct a limited Web crawl is the Fish Search system [14]. Similar results are reported for the WebCrawler [11, chapter 4], Shark Search [17], and by Chen et al [10].…”
Section: Related Worksupporting
confidence: 60%
“…Crawlers and agents have grown more sophisticated [11]. To our knowledge the earliest example of using a query to direct a limited Web crawl is the Fish Search system [14]. Similar results are reported for the WebCrawler [11, chapter 4], Shark Search [17], and by Chen et al [10].…”
Section: Related Worksupporting
confidence: 60%
“…The main challenges in focused crawling relate to the prioritization of URLs not yet visited, which may be based on similarity measures [24,26], hyperlink distance-based limits [30,31], or combinations of text and hyperlink analysis with Latent Semantic Indexing (LSI) [32]. Machine learning approaches, including naïve Bayes classifiers [25,33], Hidden Markov Models [34], reinforcement learning [35], genetic algorithms [36], and neural networks [37], have also been applied to prioritize the unvisited URLs.…”
Section: Focused and Deep-web Crawlingmentioning
confidence: 99%
“…In some early works on the subject of focused collection of data from the web, web crawling was simulated by a group of fish migrating on the Web [3]. In the so called fish search, each url corresponds to a fish whose survivability is dependant on visited page relevance and remote server speed.…”
Section: Related Workmentioning
confidence: 99%
“…In last years of 90th decade, Alta Vista's crawler, called the Scooter, was running on a 1.5GB memory, 30GB RAID disk, 4x533MHz AlphaServer 4100 5/300 with 1 GB/s I/O bandwidth 2 . In spite of these heroic efforts with high-end multiprocessors and clever crawling software, the largest crawls cover only 30-40% of the web, and refreshes take weeks to a month 3 . The Web in many ways simulates a social network: links do not point to pages at random but reflect the page authors' idea of what other relevant or interesting pages exists.…”
Section: Introductionmentioning
confidence: 99%