More effective, efficient, and scalable web crawler system architecture

El-Ramly, N.A.; Harb, Hany M.; Amin, Maalej Mohamed; Tolba, Amr

doi:10.1109/iceec.2004.1374396

Cited by 2 publications

(2 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The pages are retrieved by the web crawler and follow the link available on that page. It is sent to the parsera major component in the crawling technology, which actually checks whether relevant information is retrieved [5]. The relevant contents are then indexed by the indexer [6] and it is stored for later use.…”

Section: Architecture Of a Crawlermentioning

confidence: 99%

Web Crawler in Mobile Systems

Pavalam¹,

Raja²,

awahar³

et al. 2012

IJMLC

View full text Add to dashboard Cite

Abstract-With the advent of internet technology, data has exploded to a considerable amount. Large volumes of data can be explored easily through search engines, to extract valuable information. Web crawlers are an indispensible part of search engine, which are program (proceeds with the search term) that can traverse through the hyperlinks, indexes them, parses the files and adds new links in to its queue and the mentioned process is done several times until search term vanishes from those pages. The web crawler looks for updating the links which has already been indexed. This paper briefly reviews the concepts of web crawler, its architecture and its different types. It lists the software used by various mobile systems and also explores the ways of usage of web crawler in mobile systems and reveals the possibility for further research.Index Term-Web crawlers, mobile systems, mobile web crawler.

show abstract

Section: Architecture Of a Crawlermentioning

confidence: 99%

Web Crawler in Mobile Systems

Pavalam¹,

Raja²,

awahar³

et al. 2012

IJMLC

View full text Add to dashboard Cite

show abstract

“…The nature of the web is to link multiple resources as hyper-links among them and, following the analogy, the process of reaching an end resource is done by crawling the interconnected nodes. Historically, the search engines have been fed by multiple web crawlers [4,6,9] that automatically track and follow the hyper-links from the content of the web, creating a database of entries that are usually formatted into a human-readable view in order to be presented to humans and to be read by humans. This adds an overhead in the automatic retrieval of content from search engines, as most of the times their results require to be analyzed and parsed from a markup language; in addition, the way to navigate through their content is usually handled dynamically by JavaScript code in form of AJAX calls [5], which requires of a sort of human intervention like scrolling down the content or clicking on certain regions of the view, adding extra layers of complexity to the task of crawling those web sites.…”

Section: Introductionmentioning

confidence: 99%

Oculus-Crawl, a software tool for building datasets for computer vision tasks

Centeno¹,

Fidalgo²,

Gutiérrez³

et al. 2020

XXXVIII Jornadas De Automática: Gijón, 6, 7, Y 8 De Septiembre De 2017

View full text Add to dashboard Cite

Building datasets for Computer Vision tasks require a source of a large number of images, like the ones provided by the Internet search engines, joined with automated scraping tools, to construct them in a reasonable time. In this paper it is presented Oculus-Crawl, a tool designed to crawl and scrape images from the search engines Google and Yahoo Images to build datasets of pictures, that is modular, scalable and portable. It is also discussed a benchmark for this crawler and an internal feature for storing and sharing big datasets, that makes it suitable for Computer Vision and Machine Learning tasks. In our tests we were able to crawl and fetch 11.555 images in less than 14 minutes, including also their meta-data description, showing that it might be well-suited for retrieving large datasets.

show abstract

More effective, efficient, and scalable web crawler system architecture

Cited by 2 publications

References 6 publications

Web Crawler in Mobile Systems

Web Crawler in Mobile Systems

Oculus-Crawl, a software tool for building datasets for computer vision tasks

Contact Info

Product

Resources

About