2016
DOI: 10.1109/tst.2016.7787007
|View full text |Cite
|
Sign up to set email alerts
|

Robots exclusion and guidance protocol

Abstract: With the rapid development of the Internet, general-purpose web crawlers have increasingly become unable to meet people's individual needs as they are no longer efficient enough to fetch deep web pages. The presence of several deep web pages in the websites and the widespread use of Ajax make it difficult for generalpurpose web crawlers to fetch information quickly and efficiently. On the basis of the original Robots Exclusion Protocol (REP), a Robots Exclusion and Guidance Protocol (REGP) is proposed in this … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…Gerdes and Stringam (2008) provide detailed instructions for applying the Robot Exclusion Protocol; here, we provide a quick overview. The Robot Exclusion Standard allows the website manager to indicate whether web scrapers are Allowed or Disallowed access to specific portions of their site, and indicate how frequently robots can visit the site (Ge & Ding, 2016). Typical access frequencies are once per second.…”
Section: Ethics and Integrity In Online Research—not Everything Legal...mentioning
confidence: 99%
“…Gerdes and Stringam (2008) provide detailed instructions for applying the Robot Exclusion Protocol; here, we provide a quick overview. The Robot Exclusion Standard allows the website manager to indicate whether web scrapers are Allowed or Disallowed access to specific portions of their site, and indicate how frequently robots can visit the site (Ge & Ding, 2016). Typical access frequencies are once per second.…”
Section: Ethics and Integrity In Online Research—not Everything Legal...mentioning
confidence: 99%
“…Thom (1999) used the robots.txt file and the robots Meta tag to provide guidance to robots on whether and how to catalogue a site they have contacted. The approach helps in creating robots.txt and robots Meta tag files that would enable webmasters to reduce the load placed on web server by legitimate robots As per Ge and Ding (2016), the general-purpose web crawlers are not able to crawl and fetch the deep web pages and Ajax pages available in the websites with its current Robots Protocol designed and developed by many search engine companies. In order to help robots to crawl any kind of pagers, the authors have proposed a Robots Exclusion and Guidance Protocol (REGP) by integrating their proposal along with the current available robots protocols developed.…”
Section: Literature Reviewmentioning
confidence: 99%