Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification

Klikowski, Jakub; Woźniak, Michał

doi:10.1007/978-3-030-19738-4_36

Cited by 8 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1) Predicting death from hospital-acquired infections in trauma patients in the absence of a balanced dataset (C5.0 and CHAID); (2) Predicting death from hospital-acquired infection in the trauma patients using a balanced dataset by sampling methods (reduced data set) (C5.0 and CHAID); (3) Clustering hospital-acquired infections in trauma patients by k-means algorithms; (4) Predicting death from hospital-acquired infections in trauma patients in each cluster (C5.0 and CHAID); (5) Predicting death from hospital-acquired infections in trauma patients with SMOTE-C5.0 and ADASYN-C5.0; (6) Predicting death from hospital-acquired infections in the trauma patients with SMOTE-SVM, ADASYN-SVM, SMOTE-ANN, and ADASYN-ANN. Many previous studies have attempted to handle unbalanced data [12][13][14] by adopting various approaches, such as using the right evaluation metrics, resampling the training set (under-sampling, and over-sampling), using K-fold cross-validation appropriately, ensemble different resampled datasets, resampling different ratios, and clustering the frequent class. However, no best model for these problems has been identified, while this strongly relates to techniques, models, and subjects used [2].…”

Section: Introductionmentioning

confidence: 99%

“…Many previous studies have attempted to handle unbalanced data [ 12 – 14 ] by adopting various approaches, such as using the right evaluation metrics, resampling the training set (under-sampling, and over-sampling), using K-fold cross-validation appropriately, ensemble different resampled datasets, resampling different ratios, and clustering the frequent class. However, no best model for these problems has been identified, while this strongly relates to techniques, models, and subjects used [ 2 ].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mortality Prediction from Hospital-Acquired Infections in Trauma Patients Using an Unbalanced Dataset

et al. 2020

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Mortality Prediction from Hospital-Acquired Infections in Trauma Patients Using an Unbalanced Dataset

et al. 2020

View full text Add to dashboard Cite

“…The previous components are regularly updated to make the ensemble react to different kinds of concept drift. Klikowski et al 129 proposed a multi‐sampling random subspace ensemble method (MSRS). The algorithm uses a random subspace method and uses various sampling methods to balance the data to ensure proper diversity of classifier ensemble, and through the use of various oversampling techniques to ensure diversity.…”

Section: Passive Handling Methodsmentioning

confidence: 99%

A survey of active and passive concept drift handling methods

Han

Chen

et al. 2022

Computational Intelligence

View full text Add to dashboard Cite

At present, concept drift in the nonstationary data stream is showing trends with different speeds and different degrees of severity, which has brought great challenges to many fields like data mining and machine learning. In the past two decades, a lot of methods dedicated to handling concept drift in the nonstationary data stream have emerged. A novel perspective is proposed to classify these methods, and the current concept drift handling methods are comprehensively explained from the active handling methods and the passive handling methods. In particular, active handling methods are analyzed from the perspective of handling one specific type of concept drift and handling multiple types of concept drift, and passive handling methods are analyzed from the perspective of single learner and ensemble learning. Many concept drift handling methods in this survey are analyzed and summarized in terms of the comparing algorithms, learning model, applicable drift type, advantages, and disadvantages of the algorithms. Finally, further research directions are given, including the active and passive mixing methods, class imbalance, the existence of novel class in the data stream, and the noise in the data stream.

show abstract

“…The most popular approach lies in combining resampling techniques with Online Bagging (Wang et al, 2015Wang and Pineau, 2016). Similar strategies can be applied to Adaptive Random Forest (Gomes et al, 2017), Online Boosting (Klikowski and Woźniak, 2019;Gomes et al, 2019) 2017), Dynamic Feature Selection (Wu et al, 2014), Adaptive Random Forest with resampling (Ferreira et al, 2019), Kappa Updated Ensemble (Cano and Krawczyk, 2020), Robust Online Self-Adjusting Ensemble (Cano and Krawczyk, 2022) or any ensemble that can incrementally update its base learners (Ancy and Paulraj, 2020;Li et al, 2020). It is interesting to note that preprocessing approaches enhance diversity among base classifiers (Zyblewski et al, 2019).…”

Section: Ensembles For Imbalanced Data Streamsmentioning

confidence: 99%

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Aguiar¹,

Krawczyk²,

Cano³

2022

Preprint

View full text Add to dashboard Cite

Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures on how to evaluate these algorithms. This work presents a taxonomy of algorithms for imbalanced data streams and proposes a standardized, exhaustive, and informative experimental testbed to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to the largest experimental study conducted so far in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental testbed is fully reproducible and easy to extend with new methods. This way we propose first standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create trustworthy and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.

show abstract

Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification

Cited by 8 publications

References 24 publications

Mortality Prediction from Hospital-Acquired Infections in Trauma Patients Using an Unbalanced Dataset

Mortality Prediction from Hospital-Acquired Infections in Trauma Patients Using an Unbalanced Dataset

A survey of active and passive concept drift handling methods

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Contact Info

Product

Resources

About