Decision tree classifier for network intrusion detection with GA-based feature selection

Stein, Gary L.; Chen, Bing; Wu, Annie S.; Hua, Kien A.

doi:10.1145/1167253.1167288

Cited by 274 publications

(125 citation statements)

References 21 publications

Supporting

Mentioning

124

Contrasting

Unclassified

Order By: Relevance

“…1 Comparison of feature selection studies for network traffic anomaly detection not specified. For the same data set, in Stein et al (2005) a genetic algorithm (GA) wrapper with a DTC as a validation model looks for relevant features. They are shown for the DoS type of attack case.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Analysis of network traffic features for anomaly detection

2014

View full text Add to dashboard Cite

Anomaly detection in communication networks provides the basis for the uncovering of novel attacks, misconfigurations and network failures. Resource constraints for data storage, transmission and processing make it beneficial to restrict input data to features that are (a) highly relevant for the detection task and (b) easily derivable from network observations without expensive operations. Removing strong correlated, redundant and irrelevant features also improves the detection quality for many algorithms that are based on learning techniques. In this paper we address the feature selection problem for network traffic based anomaly detection. We propose a multi-stage feature selection method using filters and stepwise regression wrappers. Our analysis is based on 41 widely-adopted traffic features that are presented in several commonly used traffic data sets. With our combined feature selection method we could reduce the original feature vectors from 41 to only 16 features. We tested our results with five fundamentally different classifiers, observing no significant reduction of the detection performance. In order to quantify the practical benefits of our results, we analyzed the costs for generating individual features from standard IP Flow Information Export records, available at many routers. We show that we can eliminate 13 very costly features and thus reducing the computational effort for on-line feature generation from live traffic observations at network nodes.

show abstract

Section: Related Workmentioning

confidence: 99%

“…4). f20 is usually scorned also in related works, except for Stein et al (2005), which utilizes the KDD Cup'99 (where also f20 is 0 for all observations).…”

Section: Feature Weighting and Rankingmentioning

confidence: 99%

Analysis of network traffic features for anomaly detection

2014

View full text Add to dashboard Cite

show abstract

“…However, Gary Stein et al [15] suggest that not all 41 features are required for classification of four categories of attack: Probe, DOS, U2R and R2L. In their work they used Genetic Algorithm to select relevant features for decision tree, with a goal of increasing detection rate and decreasing false alarm rate.…”

Section: Decision Treementioning

confidence: 99%

Survey on Classification Techniques for Intrusion Detection

Sapate¹,

Raut²

2014

Computer Science &Amp; Information Technology ( CS &Amp; IT )

View full text Add to dashboard Cite

show abstract

“…Feature Selection is used to minimise the number of metrics in a given dataset and to optimise the selection process of the most relevant set of metrics [2]. These techniques play an important role in improving the efficiency of IDSs, producing more accurate results.…”

Section: Introductionmentioning

confidence: 99%

Automatic Dataset Labelling and Feature Selection for Intrusion Detection Systems

Aparicio-Navarro

Kyriakopoulos

Parish

2014

2014 IEEE Military Communications Conference

View full text Add to dashboard Cite

Automatic Dataset Labelling and Feature Selection for Intrusion Detection SystemsFrancisco J. Aparicio-Navarro, Konstantinos G. Kyriakopoulos, David J. Parish School of Electronic, Electrical and System Engineering Loughborough University Loughborough, LE11 3TU, UK e-mail: {elfja2, elkk, d.j.parish}@lboro.ac.uk Abstract-Correctly labelled datasets are commonly required. Three particular scenarios are highlighted, which showcase this need. When using supervised Intrusion Detection Systems (IDSs), these systems need labelled datasets to be trained. Also, the real nature of the analysed datasets must be known when evaluating the efficiency of the IDSs when detecting intrusions. Another scenario is the use of feature selection that works only if the processed datasets are labelled. In normal conditions, collecting labelled datasets from real networks is impossible. Currently, datasets are mainly labelled by implementing off-line forensic analysis, which is impractical because it does not allow real-time implementation. We have developed a novel approach to automatically generate labelled network traffic datasets using an unsupervised anomaly based IDS. The resulting labelled datasets are subsets of the original unlabelled datasets. The labelled dataset is then processed using a Genetic Algorithm (GA) based approach, which performs the task of feature selection. The GA has been implemented to automatically provide the set of metrics that generate the most appropriate intrusion detection results.

show abstract

Decision tree classifier for network intrusion detection with GA-based feature selection

Cited by 274 publications

References 21 publications

Analysis of network traffic features for anomaly detection

Analysis of network traffic features for anomaly detection

Survey on Classification Techniques for Intrusion Detection

Automatic Dataset Labelling and Feature Selection for Intrusion Detection Systems

Contact Info

Product

Resources

About