Intrusion detection system (IDS) is the system whichidentifies malicious activity on the network. As the Internet volume is increasing rapidly, security against the real time attacks and their fast detection issues gain attention of many researchers. Data mining methods can be effectively applied to (IDS) to tackle the problems of dynamic huge network data and to improve IDS performance. We can reduce the time complexity by selecting only useful features to build model for classification. There are many features selection techniques are developed either to select the features or extract features. In this paper, an evolutionary approach for feature selection is proposed which is based on mathematical intersection principle. Genetic algorithm (GA) is used as a search method while selecting features from full NSL KDD data set along with the intersection principle of selecting those only who appears everywhere in the experiment. The results of proposed approach when compared using classifiers, it shows tremendous growth in accuracy of a Naïve Bayes classifier with reduced time and minimum number of features.
Recently, stream data mining applications has drawn vital attention from several research communities. Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine learning area has been developing learning algorithms that have certain assumptions on underlying distribution of data such as data should have predetermined distribution. Such constraints on the problem domain lead the way for development of smart learning algorithms performance is theoretically verifiable. Real-word situations are different than this restricted model. Applications usually suffers from problems such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also usual in real world applications, resulting in the "concept drift" which is related with data stream examples. These issues have been separately addressed by the researchers, also, it is observed that joint problem of class imbalance and concept drift has got relatively little research. If the final objective of clever machine learning techniques is to be able to address a broad spectrum of real world applications, then the necessity for a universal framework for learning from and tailoring (adapting) to, environment where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated. In this paper, we first present an overview of issues that are observed in stream data mining scenarios, followed by a complete review of recent research in dealing with each of the issue.
One way to improve accuracy of a classifier is to use the minimum number of features. Many feature selection techniques are proposed to find out the most important features. In this paper, feature selection methods Co-relation based feature Selection, Wrapper method and Information Gain are used, before applying supervised learning based classification techniques. The results show that Support vector Machine with Information Gain and Wrapper method have the best results as compared to others tested.
In many data mining applications the imbalanced learning problem is becoming ubiquitous nowadays. When the data sets have an unequal distribution of samples among classes, then these data sets are known as imbalanced data sets. When such highly imbalanced data sets are given to any classifier, then classifier may misclassify the rare samples from the minority class. To deal with such type of imbalance, several undersampling as well as oversampling methods were proposed. Many undersampling techniques do not consider distribution of information among the classes, similarly some oversampling techniques lead to the overfitting or may cause overgeneralization problem. This paper proposes an MLPbased undersampling technique (MLPUS) which will preserve the distribution of information while doing undersampling. This technique uses stochastic measure evaluation for identifying important samples from the majority as well as minority samples. Experiments are performed on 5 real world data sets for the evaluation of performance of proposed work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.