2016
DOI: 10.5121/avc.2016.3301
|View full text |Cite
|
Sign up to set email alerts
|

Survey of Web Crawling Algorithms

Abstract: ABSTRACT

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 17 publications
(14 reference statements)
0
5
0
Order By: Relevance
“…Through the mining of web crawler algorithms, various possibilities are verified, including breadth-first (search the neighbors at the same level), depth-first (traverse to the bottom from the root node), URL ordering (queue), page-rank (importance based on the number of backlinks or citations), online page importance (importance of a page in a website), largest sites first (websites with the largest number of pages), page request—HTTP or the dynamic, customized site map (applicable to deal with updates on already visited pages), and filtering (query-based approach) [ 7 , 80 , 81 ]. In some of these algorithms, keywords are accepted as the search query, and all relevant URLs fulfilling that search query are returned.…”
Section: Discussionmentioning
confidence: 99%
“…Through the mining of web crawler algorithms, various possibilities are verified, including breadth-first (search the neighbors at the same level), depth-first (traverse to the bottom from the root node), URL ordering (queue), page-rank (importance based on the number of backlinks or citations), online page importance (importance of a page in a website), largest sites first (websites with the largest number of pages), page request—HTTP or the dynamic, customized site map (applicable to deal with updates on already visited pages), and filtering (query-based approach) [ 7 , 80 , 81 ]. In some of these algorithms, keywords are accepted as the search query, and all relevant URLs fulfilling that search query are returned.…”
Section: Discussionmentioning
confidence: 99%
“…The process of getting information from web pages can be done through web crawling processes and through the Really Simple Syndication (RSS) format. Some web crawling methods such as By HTTP Get Request and Dynamic Web Page and By the use of filters are the most preferred methods [4]. In addition, the RSS format is a form of content syndication from Extensible Markup Language (XML) based websites that can also be used [5].…”
Section: Theoritical Basis and Related Workmentioning
confidence: 99%
“…The application of the Naïve Bayes classification method is done by applying the Bayes theorem which is formulated by equation (4).…”
Section: Training Using the Svm Knn And Naive Bayesmentioning
confidence: 99%
“…In this work we focus on tasks (b) and (c). To understand the detailed working of crawlers, see [2,3,4,5,6].…”
Section: Introductionmentioning
confidence: 99%