A focused crawler combinatory link and content model based on T-Graph principles

Seyfi, Ali; Patel, Ahmed

doi:10.1016/j.csi.2015.07.001

Cited by 4 publications

(4 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1 illustrates the framework architecture of a Treasure-Crawler-based search engine and its modules which satisfy all the functional and non-functional requirements. The details of this architecture are giv-en in an under-review paper titled as: "A Focused Crawler Combinatory Link and Content Model Based on T-Graph Principles" [1]. These modules are designed in a way to have the ability of being plugged and played while requiring minimum changes in other modules or the adjacent module interfaces.…”

Section: Methodsmentioning

confidence: 99%

“…[8] In addition to the above procedure, the T-Graph structure as an exemplary guide carries out the task of priority association by providing a conceptual route for the crawler to follow and find on-topic regions. This phase is elaborated in [1].…”

Section: Watchdogmentioning

confidence: 99%

“…o improve the quality of searching and indexing the Web, our proposed focused crawler depends on two main objectives, namely, to predict the topic of an unvisited page, and to prioritize the unvisited URLs within the current page by using a data structure called T-Graph. We elaborated the architecture of the Treasure-Crawler in a preceding paper [1], where a review on this subject field was discussed by naming and briefly describing some significant Web crawlers. Also, the requirements of a focused crawler were elicited and the evaluation criteria were outlined.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Empirical evaluation of the link and content-based focused Treasure-Crawler

Seyfi

Patel

Júnior

2016

Computer Standards & Interfaces

Self Cite

View full text Add to dashboard Cite

Indexing the Web is becoming a laborious task for search engines as the Web exponentially grows in size and distribution. Presently, the most effective known approach to overcome this problem is the use of focused crawlers. A focused crawler applies a proper algorithm in order to detect the pages on the Web that relate to its topic of interest. For this purpose we proposed a custom method that uses specific HTML elements of a page to predict the topical focus of all the pages that have an unvisited link within the current page. These recognized on-topic pages have to be sorted later based on their relevance to the main topic of the crawler for further actual downloads. In the Treasure-Crawler, we use a hierarchical structure called the T-Graph which is an exemplary guide to assign appropriate priority score to each unvisited link. These URLs will later be downloaded based on this priority. This paper outlines the architectural design and embodies the implementation, test results and performance evaluation of the Treasure-Crawler system. The Treasure-Crawler is evaluated in terms of information retrieval criteria such as recall and precision, both with values close to 0.5. Gaining such outcome asserts the significance of the proposed approach.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Watchdogmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Empirical evaluation of the link and content-based focused Treasure-Crawler

Seyfi

Patel

Júnior

2016

Computer Standards & Interfaces

Self Cite

View full text Add to dashboard Cite

show abstract

“…Seyfi et al [11,12] proposed a focused crawler by using T-graph principles. This work gives solution to two problems in the focused crawler platform.…”

Section: Vsm Crawler or Classic Focused Crawlermentioning

confidence: 99%

A Critique Empirical Evaluation of Relevance Computation for Focused Web Crawlers

Dhanith

Surendiran

Raja

2021

Braz. arch. biol. technol.

View full text Add to dashboard Cite

Analogous to the spectacular growth of information-superhighway, The Internet, demands for coherent and economical crawling methods are translucent to shoot up. Consequently, many innovative techniques have been put forth for efficient crawling. Among them the significant one is focused crawlers. The focused crawlers are capable in searching web pages that are suitable for the topics defined in advance. Focused crawlers attract several search engines on the grounds of efficient filtering, reduced memory and time consumption. This paper furnishes a relevance computation based survey on web crawling. A bunch of fifty two focused crawlers from the existing literature survey is categorized to four different classes -classic focused crawler, semantic focused crawler, learning focused crawler and ontology learning focused crawler. The prerequisite and the mastery of each metric with respect to harvest rate, target recall, precision and F1score are discussed. Future outlooks, shortcomings and strategies are also suggested.

show abstract