2017
DOI: 10.1016/j.bdr.2017.07.003
|View full text |Cite
|
Sign up to set email alerts
|

Random Forests for Big Data

Abstract: International audienceBig Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
131
0
6

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 284 publications
(137 citation statements)
references
References 35 publications
0
131
0
6
Order By: Relevance
“…Addressing global sparsity is a challenge in decision trees and, to the best of our knowledge, this has not been tackled appropriately in the literature. Standard CARTs or Random Forests (RFs) [5,7,13,16] cannot manage it due to the greedy construction of the trees. Nonetheless, some attempts have been made, see [11,12].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Addressing global sparsity is a challenge in decision trees and, to the best of our knowledge, this has not been tackled appropriately in the literature. Standard CARTs or Random Forests (RFs) [5,7,13,16] cannot manage it due to the greedy construction of the trees. Nonetheless, some attempts have been made, see [11,12].…”
Section: Introductionmentioning
confidence: 99%
“…The S-ORCT smooth formulation (9)- (16) has been implemented using Pyomo optimization modeling language [19,20] in Python 3.5 [31]. As solver, we have used IPOPT 3.11.1 [39], and have followed a multistart approach, where the process is repeated 20 times starting from different random initial solutions.…”
Section: Introductionmentioning
confidence: 99%
“…Addressing the leading challenges of statistical science, it serves to broaden not only the algorithmic but also the theoretical perspective [29,30,40]. It always involves massive data, including data streams and data heterogeneity.…”
Section: Big Data For M2m Networkmentioning
confidence: 99%
“…In addition, since its major key features are multiple sources, huge volume, and fast-changing nature, it is difficult for commonly used traditional computing methods such as machine learning, information retrieval and data mining to efficiently support the processing, analysis and computation of Big Data [40]. Therefore, in recent years, statistical methods such as clustering methods, linear regression models and bootstrapping schemes have been adapted to process Big Data [29]. Generally, Big Data can be classified based on the five main aspects of data and its content: source, store, format, staging, and processing [11,39].…”
Section: Big Data For M2m Networkmentioning
confidence: 99%
“…Random forest classification algorithm [10] is proposed by Leo Breiman and Adele Cutler et al in the 1980s. Its main idea is integration thought, which is based on the development of decision trees, and includes multiple decision trees.…”
Section: Random Forestmentioning
confidence: 99%