Selective AnDE for large data learning: a low-bias memory constrained approach

Chen, Shenglei; Martínez, Ana María; Webb, Geoffrey I.; Wang, Limin

doi:10.1007/s10115-016-0937-9

Cited by 16 publications

(6 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, in the prediction phase, the A2DE classifier predicts the class by even outsourcing one dependence estimator to the number of records available in the dataset. Further, the low bias characteristics and single-pass learning through training data of A2DE make it popular for big data analytics with high accuracy [47], linear time complexity but with high variance, and increased space complexity [48].…”

Section: Ande (Averaged N-dependence Estimator)mentioning

confidence: 99%

Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

2021

View full text Add to dashboard Cite

The proliferation of Internet of Things (IoT) systems and smart digital devices, has perceived them targeted by network attacks. Botnets are vectors buttoned up which the attackers grapple the control of IoT systems and comportment venomous activities. To confront this challenge, efficient machine learning and deep learning with suitable feature engineering are suggested to detect and protect the network from such vulnerabilities in the future. For the efficient detection of cyber attacks, the representative dataset shall be well-structured for training the model and then validating the proposed system to develop an optimal security model. In this research, we used the UNSW-NB15, a new IoT-Botnet dataset (a noisy and imbalanced dataset) to classify cyber-attacks. K-Medoid sampling and scatter search-based feature engineering techniques are used to obtain a representative dataset with optimal feature subsets. To validate the proposed methodologies, three most recent machine learning (ML) methods including (i) JChaid*-a recent upgrade version to Chi-square automatic interaction detection (CHAID) decision tree-based, (ii) A2DE (a semi-naive Bayesian averaged two-dependence estimator), & (iii) HGC-a hybrid of Genetic algorithm with K-means clustering and two deep learning (DL) methods such as (i) Deep Multilayer perceptron (DMLP) & (ii) Convolutional neural network (CNN) based classifiers are employed. From the extensive experimental analysis, it is pronounced that scatter search-based DMLP classifier outperforms the other competing models in terms of (i) highest detection rate with100% accuracy, 100% macro-averaged precision, 100% macro-averaged recall & 100% macro-averaged F1-score and (ii) low computational complexity with the least training time of 4.7 seconds & testing time of 0.61 seconds.

show abstract

Section: Ande (Averaged N-dependence Estimator)mentioning

confidence: 99%

Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

2021

View full text Add to dashboard Cite

show abstract

“…The functional domain of one single classifier may be limited as a result of ignoring the dependencies between some attributes. Classifiers that use the forest or ensemble method are commonly applied to fill the gap [12,14,15]. In the following subsection, we first introduce NB and its corresponding ensemble classifier, that is, AODE.…”

Section: Bayesian Network Classifiersmentioning

confidence: 99%

“…After the discovery of NB, many state-of-the-art algorithms, for example, tree-augmented naive Bayes (TAN) [10] and a k-dependence Bayesian classifier (KDB) [11], are proposed to relax the independence assumption by allowing conditional dependence between attributes X i and X j , which is measured by conditional mutual information I(X i ; X j |C). In order to improve predictive accuracy relative to a single model, ensemble methods [12,13], for example, averaged one-dependence estimator (AODE) [14] and averaged tree-augmented naive Bayes (ATAN) [15] methods, generate multiple global models from a single learning algorithm through randomization (or perturbation). The KDB is a form of a restricted Bayesian network classifier with numerous desirable properties in the context of learning from large quantities of data.…”

Section: Introductionmentioning

confidence: 99%

K-Dependence Bayesian Classifier Ensemble

Duan

Wang

2017

Entropy

Self Cite

View full text Add to dashboard Cite

Abstract:To maximize the benefit that can be derived from the information implicit in big data, ensemble methods generate multiple models with sufficient diversity through randomization or perturbation. A k-dependence Bayesian classifier (KDB) is a highly scalable learning algorithm with excellent time and space complexity, along with high expressivity. This paper introduces a new ensemble approach of KDBs, a k-dependence forest (KDF), which induces a specific attribute order and conditional dependencies between attributes for each subclassifier. We demonstrate that these subclassifiers are diverse and complementary. Our extensive experimental evaluation on 40 datasets reveals that this ensemble method achieves better classification performance than state-of-the-art out-of-core ensemble learners such as the AODE (averaged one-dependence estimator) and averaged tree-augmented naive Bayes (ATAN).

show abstract

“…Although information theory was primarily concerned with the problem of digital communication when it was first introduced by Claude E. Shannon in the 1940s [15], the theory has much broader applicability in the field of classification [16,17]. Here, we review several commonly used definitions.…”

Section: Information Theorymentioning

confidence: 99%

Label-Driven Learning Framework: Towards More Accurate Bayesian Network Classifiers through Discrimination of High-Confidence Labels

Sun

Wang

Sun

2017

Entropy

Self Cite

View full text Add to dashboard Cite

Bayesian network classifiers (BNCs) have demonstrated competitive classification accuracy in a variety of real-world applications. However, it is error-prone for BNCs to discriminate among high-confidence labels. To address this issue, we propose the label-driven learning framework, which incorporates instance-based learning and ensemble learning. For each testing instance, high-confidence labels are first selected by a generalist classifier, e.g., the tree-augmented naive Bayes (TAN) classifier. Then, by focusing on these labels, conditional mutual information is redefined to more precisely measure mutual dependence between attributes, thus leading to a refined generalist with a more reasonable network structure. To enable finer discrimination, an expert classifier is tailored for each high-confidence label. Finally, the predictions of the refined generalist and the experts are aggregated. We extend TAN to LTAN (Label-driven TAN) by applying the proposed framework. Extensive experimental results demonstrate that LTAN delivers superior classification accuracy to not only several state-of-the-art single-structure BNCs but also some established ensemble BNCs at the expense of reasonable computation overhead.

show abstract

Selective AnDE for large data learning: a low-bias memory constrained approach

Cited by 16 publications

References 17 publications

Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

K-Dependence Bayesian Classifier Ensemble

Label-Driven Learning Framework: Towards More Accurate Bayesian Network Classifiers through Discrimination of High-Confidence Labels

Contact Info

Product

Resources

About