2016
DOI: 10.1016/j.asoc.2016.05.044
|View full text |Cite
|
Sign up to set email alerts
|

Evolving meta-ensemble of classifiers for handling incomplete and unbalanced datasets in the cyber security domain

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(9 citation statements)
references
References 20 publications
0
9
0
Order By: Relevance
“…Therefore, in large datasets plagued by MCAR missing data, samples with missing values can be discarded without biasing the distribution of the remaining data. This study simulated missing rates of 10%, 20%, 30%, 40%, and 50% [9,15,20,32,35,41] to compare the proposed method to the imputation methods listed in the UCI datasets experiment. As shown in Eq.…”
Section: Data Pre-processingmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, in large datasets plagued by MCAR missing data, samples with missing values can be discarded without biasing the distribution of the remaining data. This study simulated missing rates of 10%, 20%, 30%, 40%, and 50% [9,15,20,32,35,41] to compare the proposed method to the imputation methods listed in the UCI datasets experiment. As shown in Eq.…”
Section: Data Pre-processingmentioning
confidence: 99%
“…The mean and mode are common statistical MVI technique measurements that typically require a short time to compute. However, machine learning MVI techniques, such as support vector machine (SVM), and random forest (RF) methods, require a long computation time to achieve high accuracy [20][21][22][23]. On the other hand, the k-nearest neighbor (KNN) technique [24] requires much less imputation time than other machine learning techniques [25][26][27][28].…”
mentioning
confidence: 99%
“…Considering all these issues, we choose to adopt CAGE-MetaCombiner, which employs a meta-ensemble model to operate efficiently with missing data, described in detail in [9], in our scalable architecture proposed in this paper. To synthesise, an ensemble is built for each different source of data (dataset) by using a distributed GP tool, used to generate the combiner function, i.e., CellulAr GEnetic programming (CAGE) [5], a distributed/parallel genetic programming (GP) implementation.…”
Section: Missing Data and Ensemble Of Classifiersmentioning
confidence: 99%
“…These aim to include mutually complementary individual classifiers, which are characterized by high diversity in terms of classifier structure [34], internal parameters or classifier inputs. As stated in [35], ensemble and meta-ensemble methods show a number of advantages with regard to using a single model, i.e., they reduce the variance of the error, the bias, and the dependence on a single dataset and work well in the case of unbalanced classes.…”
Section: The Proposed Modelmentioning
confidence: 99%