2019
DOI: 10.1186/s40537-019-0186-3
|View full text |Cite
|
Sign up to set email alerts
|

Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification

Abstract: With the vast growth of information volume and variety in the recent years, many organizations focus on big data platforms and technologies [6]. In order to train machine learning algorithms on big data there is a need for a distributed framework such as MAPREDUCE, which can induce in parallel multiple models out of small subsets of massive-scale training data, which cannot fit into the memory of a single machine. Here, we limit our discussion to the model combining phase of distributed data processing, known … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(21 citation statements)
references
References 34 publications
0
21
0
Order By: Relevance
“…The first ensemble method was used by combining two linear regression models and nearest neighbor method to enhance the performance of recognition systems [344]. To determine the target variable, decision trees are very often used as ensemble method [345,346].…”
Section: Ensemble Based Methodsmentioning
confidence: 99%
“…The first ensemble method was used by combining two linear regression models and nearest neighbor method to enhance the performance of recognition systems [344]. To determine the target variable, decision trees are very often used as ensemble method [345,346].…”
Section: Ensemble Based Methodsmentioning
confidence: 99%
“…Both RF and XGBoost are based on the classification and regression trees (CART). CART has a well-known problem related to the model instability in which the tree structure changes significantly by small change in training data [67]. RF and XGBoost adopt different approaches to mitigating the weakness of CART.…”
Section: Machine Learning Approachesmentioning
confidence: 99%
“…Text to speech conversion in the case of CRM automation, hard real-time applications, aircraft systems, fraud detection during online transactions, dynamic systems, and other applications require fast classification to take timely necessary remedial actions. In big data scenarios, in the Map phase, if it is required to predict the class label for a huge number of instances, an efficient method is much useful to save the processing time [16].…”
Section: Introductionmentioning
confidence: 99%
“…Ahmad Ashari et al have studied the classification speed of three standard classifiers [17] and opined that the decision tree's performance is relatively high. Weinberg et al [16] have proposed a method to improve classification speed when the ensemble of decision trees is constructed on big data. Their method finds the best representative tree from the ensemble on which classification is performed.…”
Section: Introductionmentioning
confidence: 99%