Robots exclusion and guidance protocol

Ge, Dajie; Ding, Zhijun

doi:10.1109/tst.2016.7787007

Cited by 3 publications

(2 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gerdes and Stringam (2008) provide detailed instructions for applying the Robot Exclusion Protocol; here, we provide a quick overview. The Robot Exclusion Standard allows the website manager to indicate whether web scrapers are Allowed or Disallowed access to specific portions of their site, and indicate how frequently robots can visit the site (Ge & Ding, 2016). Typical access frequencies are once per second.…”

Section: Ethics and Integrity In Online Research—not Everything Legal...mentioning

confidence: 99%

Legal and Ethical Issues of Collecting and Using Online Hospitality Data

Stringam

Gerdes

Anderson

2021

Cornell Hospitality Quarterly

View full text Add to dashboard Cite

In “Web Scraping for Hospitality Research: Overview, Opportunities, and Implications,” Han and Anderson present the tools and methods for collecting online data through data scraping. Although the article describes in detail the processes for gathering data, and presented recent court rulings that allow data scraping in the United States, it did not adequately address the ethical collection of online data. The internet has opened up new opportunities to research interesting questions and to collect data much faster than has been possible in the past. The emergence of online databases and social media sites enables new lines of research while at the same time introducing new ethical questions for both researchers and institutional research boards (IRBs). Using web-based data for research is not new. However, as Han and Anderson point out, a 2019 ruling ( HiQ Labs, Inc. v. LinkedIn Corporation, appeal from the United States District Court) has redefined what is legal in online data collection. In the following, we highlight some of the key legal and ethical issues around the use of scraped data for academic research with the intent of ensuring researchers, reviewers, and editors are cognizant of some of these (evolving) issues.

show abstract

Section: Ethics and Integrity In Online Research—not Everything Legal...mentioning

confidence: 99%

Legal and Ethical Issues of Collecting and Using Online Hospitality Data

Stringam

Gerdes

Anderson

2021

Cornell Hospitality Quarterly

View full text Add to dashboard Cite

show abstract

“…Thom (1999) used the robots.txt file and the robots Meta tag to provide guidance to robots on whether and how to catalogue a site they have contacted. The approach helps in creating robots.txt and robots Meta tag files that would enable webmasters to reduce the load placed on web server by legitimate robots As per Ge and Ding (2016), the general-purpose web crawlers are not able to crawl and fetch the deep web pages and Ajax pages available in the websites with its current Robots Protocol designed and developed by many search engine companies. In order to help robots to crawl any kind of pagers, the authors have proposed a Robots Exclusion and Guidance Protocol (REGP) by integrating their proposal along with the current available robots protocols developed.…”

Section: Literature Reviewmentioning

confidence: 99%

Practice of Robots Exclusion Protocol in Bhutan

2020

JEP

View full text Add to dashboard Cite

Most of the search engines rely on the web robots to collect information from the web. The web is open access and unregulated which makes it easier for the robots to crawl and index all the contents of websites so easily. But not all wish to get their websites and web pages indexed by web crawlers. The diverse crawling activities can be regulated and managed by deploying the Robots Exclusion Protocol (REP) in a file called robots.txt in the server. The method used is a de-facto standard and most of the ethical robots will follow the rules specified in the file. In Bhutan, there are many websites and in order to regulate those bots, the usage of the robots.txt file in the websites are not known since no study has been carried out till date. The main aim of the paper is to investigate the use of robots.txt files in various organizations' websites in Bhutan. And further, to analyze its content present in the file if it exist. A total of 50 websites from various sectors like colleges, government ministries, autonomous agencies, corporations and newspaper agencies were selected for the investigation to check the usage of the file. Moreover, the files were further studied and analyzed for its file size, types of robots specified, and correct use of the file. The result showed that that almost 70% of the websites investigated are using the default robots.txt file generally created by the Joomla and Word press Content Management Systems (CMS) which ultimately specifies that there is a usage of the file. But on the other hand, the file is not really taken into seriously and almost 70% of it lacks major and best protocols defined in it that will help define the access and denial to various resources to various types of robots available on the web. Approximately 30% of the URLs adopted for the study show that the REP file is not added in their web server, thus providing unregulated access of resources to all types of web robots.

show abstract

Research on Simulation System of Rubber Particle Mat Based on Web Technology

Zhang

Li²,

Qing

2021

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

The current design of rubber granular floor mats is limited to a single fixed plane sample drawing, and it consumes a lot of time, manpower and material resources in obtaining customer design requirements. In response to this situation, the use of Web development technology to realize the free combination of particles in proportion, generate simulation application scenarios, provide interface operations, and finally complete the order process. By constructing a particle probability distribution model, establishing a particle position coordinate matrix, and developing a particle mat simulation system, based on the sample quality of the sampled data, it is compared with the real artificial mechanical product. Finally, by mixing and matching 9 different color values according to the industrial production ratio, randomly combining 10 groups and comparing with the real products of the same color ratio, quantifying the color difference, contrast, and particle position offset to obtain the production simulation floor mat sample and the actual product. The finished product is quite close. The system generates simulated particle mat maps and real sample maps with a high degree of simulation, and provides an interface that allows users to directly match the particle combination program that suits their needs, saving labor cycles and improving work efficiency.

show abstract

Robots exclusion and guidance protocol

Cited by 3 publications

References 8 publications

Legal and Ethical Issues of Collecting and Using Online Hospitality Data

Legal and Ethical Issues of Collecting and Using Online Hospitality Data

Practice of Robots Exclusion Protocol in Bhutan

Research on Simulation System of Rubber Particle Mat Based on Web Technology

Contact Info

Product

Resources

About