2012
DOI: 10.1007/978-3-642-33460-3_28
|View full text |Cite
|
Sign up to set email alerts
|

Ensembles on Random Patches

Abstract: Abstract. In this paper, we consider supervised learning under the assumption that the available memory is small compared to the dataset size. This general framework is relevant in the context of big data, distributed databases and embedded systems. We investigate a very simple, yet effective, ensemble framework that builds each individual model of the ensemble from a random patch of data obtained by drawing random subsets of both instances and features from the whole dataset. We carry out an extensive and sys… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
77
0
1

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 111 publications
(79 citation statements)
references
References 7 publications
1
77
0
1
Order By: Relevance
“…To create the different random subsamples we used three different methods: bagging, 33 pasting 34 and random patches. 8 The bagging 33 method consists in randomly drawn bootstrap subsets of the original data. Pasting 34 is a similar method in which the random samples are extracted without replacement.…”
Section: Ensembles Of Cost-sensitive Decision Treesmentioning
confidence: 99%
See 1 more Smart Citation
“…To create the different random subsamples we used three different methods: bagging, 33 pasting 34 and random patches. 8 The bagging 33 method consists in randomly drawn bootstrap subsets of the original data. Pasting 34 is a similar method in which the random samples are extracted without replacement.…”
Section: Ensembles Of Cost-sensitive Decision Treesmentioning
confidence: 99%
“…The CSDT algorithm only creates one tree in order to make a classification, however, individual decision trees typically suffer from high variance. 8 A very efficient and simple way to address this flaw is to use them in the context of ensemble methods.…”
Section: Introductionmentioning
confidence: 99%
“…Another line of research considers variants that are tailored towards special cases. For instance, Louppe and Geurts [18] consider small subsets of the data, called patches. Each patch is based on a different subset of features and the overall ensemble consists of trees built independently on the patches.…”
Section: Large-scale Constructionmentioning
confidence: 99%
“…We can categorize big data approaches to decision tree induction as follows: building one big tree (Andrzejak et al, 2013;Panda et al, 2009;Ntoutsi et al, 2008;Zhang and Jiang, 2012;Pawlik and Augsten, 2011;Narlikar, 1998;Sreenivas et al, 2000;Goil and Choudhary, 2001;Amado et al, 2001;Domingos and Hulten, 2000;Dai and Ji, 2014), transferring all decision trees into one rule base and back into a decision tree, ensemble approaches (Louppe and Geurts, 2012;Hansen and Salamon, 1990;Sollich and Krogh, 1996;Breiman, 1999), and others (e.g., Kargupta and Park, 2004) that do not build a new tree and use a combination of tree results. According to Ben-Haim and Tom-Tov (2010), another way to categorize the different types of algorithms for handling large datasets is to divide them into the following two groups: pre-sorting of data and using approximate representations of data.…”
Section: Background and Related Workmentioning
confidence: 99%