2017
DOI: 10.1007/s10844-017-0457-4
|View full text |Cite
|
Sign up to set email alerts
|

Semi-supervised classification trees

Abstract: In many real-life problems, obtaining labelled data can be a very expensive and laborious task, while unlabeled data can be abundant. The availability of labeled data can seriously limit the performance of supervised learning methods. Here, we propose a semi-supervised classification tree induction algorithm that can exploit both the labelled and unlabeled data, while preserving all of the appealing characteristics of standard supervised decision trees: being non-parametric, efficient, having good predictive p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
24
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 47 publications
(26 citation statements)
references
References 23 publications
2
24
0
Order By: Relevance
“…They conducted experiments with random forests consisting of 100 of the resulting semi-supervised decision trees and observed significant performance improvements over supervised random forests for several data sets. Levatić et al (2017) introduced a more generic framework for using unlabelled data in the splitting criterion by constructing an impurity measure for unlabelled data. In their experiments, they promoted feature consistency within the data subsets on each side of the splitting boundary, penalizing empirical variance for numerical data and the Gini impurity for nominal data.…”
Section: Density Regularizationmentioning
confidence: 99%
“…They conducted experiments with random forests consisting of 100 of the resulting semi-supervised decision trees and observed significant performance improvements over supervised random forests for several data sets. Levatić et al (2017) introduced a more generic framework for using unlabelled data in the splitting criterion by constructing an impurity measure for unlabelled data. In their experiments, they promoted feature consistency within the data subsets on each side of the splitting boundary, penalizing empirical variance for numerical data and the Gini impurity for nominal data.…”
Section: Density Regularizationmentioning
confidence: 99%
“…The use of a predictive tool could assist financial institutions to decide whether to grant credit to consumers who apply. Since our numerical experiments are quite encouraging, our future work is concentrated on evaluating the proposed algorithms versus relevant methodologies and frameworks addressing the credit score problem such as [27][28][29][30][31][32] and versus recently proposed advanced SSL algorithms such as [59][60][61].…”
Section: Discussionmentioning
confidence: 99%
“…This can be used in different tasks such as: classification, regression and other analyses, as they improve forecasting models and can also make combinations between trees (Rokach, 2016). There are several studies on how to make a decision tree such as Luštrek et al (2016), Levatić et al (2017), Strnad and Nerat (2016), among others. A situation can be modeled in order to direct more efficient decisionmaking, where its predictive performance is slightly better than the standard algorithms (González, Herrera and Garcia, 2015).…”
Section: Machine Learning and Methods Employedmentioning
confidence: 99%