Proceedings of the 2005 Workshop on Geographic Information Retrieval 2005
DOI: 10.1145/1096985.1096999
|View full text |Cite
|
Sign up to set email alerts
|

Geographical partition for distributed web crawling

Abstract: This paper evaluates scalable distributed crawling by means of the geographical partition of the Web. The approach is based on the existence of multiple distributed crawlers each one responsible for the pages belonging to one or more previously identified geographical zones. The work considers a distributed crawler where the assignment of pages to visit is based on page content geographical scope. For the initial assignment of a page to a partition we use a simple heuristic that marks a page within the same sc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2008
2008
2016
2016

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 26 publications
(10 citation statements)
references
References 9 publications
0
10
0
Order By: Relevance
“…Expostoet al [28] evaluated scalable distributed crawling by means of the geographical partition of the Web. The approach is based on the existence of multiple distributed crawlers each one responsible for the pages belonging to one or more previously identified geographical zones.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Expostoet al [28] evaluated scalable distributed crawling by means of the geographical partition of the Web. The approach is based on the existence of multiple distributed crawlers each one responsible for the pages belonging to one or more previously identified geographical zones.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In the literature, there have been a significant number of design alternatives, including sequential [24], parallel [5,8,20,33,36], and geographically distributed [6,14,15] Web crawlers. The three main quality objectives, common to most crawling architectures, were achieving high collection quality through download scheduling [10,26], maintaining page freshness [7,9,16,30,35], and obtaining high Web coverage [11,23].…”
Section: Previous Workmentioning
confidence: 99%
“…Exposto et al try to find the optimal locations for several Web crawlers considering the data volume and the time spent crawling [17]. Li et al study the feasibility of P2P Web search engines in terms of network bandwidth and storage space on the peers [24], and conclude that Web search using P2P technology still requires an order of magnitude more resources than available, despite a range of considered performance optimizations.…”
Section: Related Workmentioning
confidence: 99%