2021 IEEE International Conference on Big Data and Smart Computing (BigComp) 2021
DOI: 10.1109/bigcomp51126.2021.00023
|View full text |Cite
|
Sign up to set email alerts
|

MP-Boost: Minipatch Boosting via Adaptive Feature and Observation Sampling

Abstract: Boosting methods are among the best general-purpose and off-the-shelf machine learning approaches, gaining widespread popularity. In this paper, we seek to develop a boosting method that yields comparable accuracy to popular AdaBoost and gradient boosting methods, yet is faster computationally and whose solution is more interpretable. We achieve this by developing MP Boost, an algorithm loosely based on AdaBoost that learns by adaptively selecting small subsets of instances and features, or what we term minipa… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…We also use the FairStacks meta-learner to construct several weighted ensembles, shown as solid lines in Figure 2: i) random forest trees, i.e., the elements of a random forest ensemble; ii) an ensemble of 1000 minipatch decision trees, i.e. decision trees trained on small sub-samples of features and observations [42,38]; iii) learners implemented in the Scikit-Learn and XGBoost packages [35,7]; and iv) a kitchen-sink ensemble consisting of both the stand-alone methods and the previous ensembles.…”
Section: Empirical Studiesmentioning
confidence: 99%
“…We also use the FairStacks meta-learner to construct several weighted ensembles, shown as solid lines in Figure 2: i) random forest trees, i.e., the elements of a random forest ensemble; ii) an ensemble of 1000 minipatch decision trees, i.e. decision trees trained on small sub-samples of features and observations [42,38]; iii) learners implemented in the Scikit-Learn and XGBoost packages [35,7]; and iv) a kitchen-sink ensemble consisting of both the stand-alone methods and the previous ensembles.…”
Section: Empirical Studiesmentioning
confidence: 99%
“…Inspired by the subsampling approach of J+aB, our inferential approach is rooted in an ensemble built by taking tiny random subsamples of both observations and features in tabular data. This idea of double subsampling appears first in the context of random forests [41,27], linear regression [37], and more recently has been termed "minipatch ensembles" by [65,80]. We adopt this idea of minipatch ensembles and we are the first to develop inferential approaches using this approach.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, we seek to improve computational efficiency, provide interpretability in terms of feature importance, and at the same time improve clustering accuracy. We achieve these goals by leveraging the idea of minipatch learning [Yao and Allen, 2020, Yao et al, 2021, Toghani and Allen, 2021 which is an ensemble of learners trained on tiny subsamples of both observations and features. Compared to only subsampling observations in existing consensus clustering ensembles, by learning on many tiny data sets, our approach offers dramatic computational savings.…”
Section: Contributionsmentioning
confidence: 99%
“…While the core of our approach is identical to that of consensus clustering, we offer three major methodological innovations in Steps 1 and 2 of Algorithm 1 that yield dramatically faster, more accurate, and interpretable results. Our first innovation is building cluster ensembles based on tiny subsets (typically 10% or less) of both observations and features termed minipatches [Yao and Allen, 2020, Yao et al, 2021, Toghani and Allen, 2021. Note that existing consensus clustering approaches form ensembles by subsampling typically 80% of observations and all the features for each ensemble member [Wilkerson and Hayes, 2010].…”
Section: Minipatch Consensus Clusteringmentioning
confidence: 99%