2007
DOI: 10.1016/j.ipm.2006.10.011
|View full text |Cite
|
Sign up to set email alerts
|

Architecture of a grid-enabled Web search engine

Abstract: Cataloged from PDF version of article.Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to attack the page freshness problem by performing the search on the original pages residing on the Web, rather than on the previously fetched copies as done in the traditional search engines. SE4SEE also aims to obtain high download rates in W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2007
2007
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 25 publications
(8 citation statements)
references
References 36 publications
0
7
0
Order By: Relevance
“…There are many research studies in web crawling, such as URL ordering for retrieving highquality pages earlier [16], partitioning the web for efficient multi-processor crawling [17], distributed crawling [18], and focused crawling [19]. There are some crawling architectures that designed based grid computing that aims to increase the performance in specific issues in web crawling, such as a grid focused community crawling architecture for medical information retrieval services [20], Multi Agent System-based crawlers for Virtual Organizations [21], a dynamic URL assignment method for parallel web crawler [22], A Middleware for Deep Web Crawling Using the Grid [23], and Architecture of a grid-enabled Web search engine [24].…”
Section: Related Workmentioning
confidence: 99%
“…There are many research studies in web crawling, such as URL ordering for retrieving highquality pages earlier [16], partitioning the web for efficient multi-processor crawling [17], distributed crawling [18], and focused crawling [19]. There are some crawling architectures that designed based grid computing that aims to increase the performance in specific issues in web crawling, such as a grid focused community crawling architecture for medical information retrieval services [20], Multi Agent System-based crawlers for Virtual Organizations [21], a dynamic URL assignment method for parallel web crawler [22], A Middleware for Deep Web Crawling Using the Grid [23], and Architecture of a grid-enabled Web search engine [24].…”
Section: Related Workmentioning
confidence: 99%
“…However, consistent hashing prevents the system from optimizations on network distance. In IPMicra [8], [9], crawler is selected to crawl a certain Web site if they were located within the same AS or ISP network according to the information provided by Regional Internet Registries (RIRs); SE4SEE [10] reduces the network distance by assigning crawler Web sites that were located within the crawler's country. The metrics of network distance the above two systems adopt (AS differences and geographical distances) cannot fully reveal the Web hosts' true positions on the Internet, because the routers' routing strategies usually don't comply with the restrictions of ASs or cities.…”
Section: Related Workmentioning
confidence: 99%
“…However, it prevents the system from optimizations on network distance. In IPMicra [8], crawler is selected to crawl a certain Web site if they were located in the same AS or ISP network according to the information provided by Regional Internet Registries (RIRs); SE4SEE [27] reduces the network distance by assigning crawler Web sites that were located within the crawler's country; Apoidea [8] implements a Chord-based DWC system. However, it didn't make optimizations in reducing the crawler-host distance.…”
Section: Related Workmentioning
confidence: 99%