2012
DOI: 10.1016/j.eswa.2012.01.210
|View full text |Cite
|
Sign up to set email alerts
|

Feature evaluation for web crawler detection with data mining techniques

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
48
0
1

Year Published

2014
2014
2022
2022

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 70 publications
(50 citation statements)
references
References 13 publications
0
48
0
1
Order By: Relevance
“…Another system represented in this regard is converting malicious requests to signature by using anomaly extension [3]. Another system with the name of MINDS extended after [4]. MINDS was detecting different kinds of intrusion in network layers with different elements such as scan detector, anomaly detector and summarization components.…”
Section: Related Workmentioning
confidence: 99%
“…Another system represented in this regard is converting malicious requests to signature by using anomaly extension [3]. Another system with the name of MINDS extended after [4]. MINDS was detecting different kinds of intrusion in network layers with different elements such as scan detector, anomaly detector and summarization components.…”
Section: Related Workmentioning
confidence: 99%
“…Although using the traditional search engine algorithm, technology and tool can acquire the expected network information, its efficiency and accuracy are limited because the search is based on key word. Thus, web crawler technology, which can acquire the web information based on the customers' needs and search the entire network, has become the research focus [9][10][11] . Based on the existing research results, this paper discusses and analyzes the path planning, error control, strategy implementation of the search as well the realization of the system.…”
Section: Introductionmentioning
confidence: 99%
“…Although other features such as the standard deviation of requested page depth, the percentage of consecutive sequential http requests, the html-to-image ratio, and the percentage of 4xx error responses are robot session features used in previous machine learning classifiers [13], they may not be indicative of the future resource request behavior of Web robots. This is because the kinds of resources requested by a Web robot are likely to be independent of session level features.…”
Section: Classification Algorithmsmentioning
confidence: 99%
“…Stevanovic et al argue that two features, namely the standard deviation of requested pagedepth and the percentage of consecutive sequential HTTP requests belonging to the same web directory, are essential to separate robots from humans in Web server logs [13]. Yang et al also consider association rule mining to create an n-gram model of occurrence frequencies.…”
Section: Related Workmentioning
confidence: 99%