Semi-supervised classification trees

Levatić, Jurica; Ceci, Michelangelo; Kocev, Dragi; Džeroski, Sašo

doi:10.1007/s10844-017-0457-4

Cited by 47 publications

(26 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They conducted experiments with random forests consisting of 100 of the resulting semi-supervised decision trees and observed significant performance improvements over supervised random forests for several data sets. Levatić et al (2017) introduced a more generic framework for using unlabelled data in the splitting criterion by constructing an impurity measure for unlabelled data. In their experiments, they promoted feature consistency within the data subsets on each side of the splitting boundary, penalizing empirical variance for numerical data and the Gini impurity for nominal data.…”

Section: Density Regularizationmentioning

confidence: 99%

A survey on semi-supervised learning

2019

View full text Add to dashboard Cite

Semi-supervised learning is the branch of machine learning concerned with using labelled as well as unlabelled data to perform certain learning tasks. Conceptually situated between supervised and unsupervised learning, it permits harnessing the large amounts of unlabelled data available in many use cases in combination with typically smaller sets of labelled data. In recent years, research in this area has followed the general trends observed in machine learning, with much attention directed at neural network-based models and generative learning. The literature on the topic has also expanded in volume and scope, now encompassing a broad spectrum of theory, algorithms and applications. However, no recent surveys exist to collect and organize this knowledge, impeding the ability of researchers and engineers alike to utilize it. Filling this void, we present an up-to-date overview of semi-supervised learning methods, covering earlier work as well as more recent advances. We focus primarily on semi-supervised classification, where the large majority of semi-supervised learning research takes place. Our survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work. Furthermore, we propose a new taxonomy of semi-supervised classification algorithms, which sheds light on the different conceptual and methodological approaches for incorporating unlabelled data into the training process. Lastly, we show how the fundamental assumptions underlying most semi-supervised learning algorithms are closely connected to each other, and how they relate to the well-known semi-supervised clustering assumption.

show abstract

Section: Density Regularizationmentioning

confidence: 99%

A survey on semi-supervised learning

2019

View full text Add to dashboard Cite

show abstract

“…The use of a predictive tool could assist financial institutions to decide whether to grant credit to consumers who apply. Since our numerical experiments are quite encouraging, our future work is concentrated on evaluating the proposed algorithms versus relevant methodologies and frameworks addressing the credit score problem such as [27][28][29][30][31][32] and versus recently proposed advanced SSL algorithms such as [59][60][61].…”

Section: Discussionmentioning

confidence: 99%

On Ensemble SSL Algorithms for Credit Scoring Problem

et al. 2018

View full text Add to dashboard Cite

Credit scoring is generally recognized as one of the most significant operational research techniques used in banking and finance, aiming to identify whether a credit consumer belongs to either a legitimate or a suspicious customer group. With the vigorous development of the Internet and the widespread adoption of electronic records, banks and financial institutions have accumulated large repositories of labeled and mostly unlabeled data. Semi-supervised learning constitutes an appropriate machine- learning methodology for extracting useful knowledge from both labeled and unlabeled data. In this work, we evaluate the performance of two ensemble semi-supervised learning algorithms for the credit scoring problem. Our numerical experiments indicate that the proposed algorithms outperform their component semi-supervised learning algorithms, illustrating that reliable and robust prediction models could be developed by the adaptation of ensemble techniques in the semi-supervised learning framework.

show abstract

“…This can be used in different tasks such as: classification, regression and other analyses, as they improve forecasting models and can also make combinations between trees (Rokach, 2016). There are several studies on how to make a decision tree such as Luštrek et al (2016), Levatić et al (2017), Strnad and Nerat (2016), among others. A situation can be modeled in order to direct more efficient decisionmaking, where its predictive performance is slightly better than the standard algorithms (González, Herrera and Garcia, 2015).…”

Section: Machine Learning and Methods Employedmentioning

confidence: 99%

Optimization of operational costs of Call centers employing classification techniques

Moura

Pinho

Napolitano

et al. 2020

RSD

View full text Add to dashboard Cite

The provision of credit to customers of banking chains through call center services has always been one of the resources that generate significant income for financial institutions, however, the service offers a cost, which is often above desirable to guarantee profitable contracting to Bank. Based on this, this work aims to evaluate the optimization of operational costs of call center, using classification techniques, through experimentation of supervised machine learning techniques to perform the classification task, in order to generate a predictive model, which offers a better performance in the operation of offering bank credit, to carry out an effective and productive action, conceiving greater savings for the company in identifying the public with greater adherence. For this, a database comprising 11,162 call records made from a bank offering its customers a letter of credit was employed. The results showed value correlations between variables, such as duration of the call, marital status, education level and even recurrence in adhering to subscribers' credit agreements. Through the application of the PCA to reduce dimensionality and classification models, such as AdaBoost, Gradient Boosting, SVM RBF, Naive Bayes, Random Forest, it was possible to perceive the consumer profile with good acquiescence for the investment proposal and a group of people with a high probability of not adhering to the letter of credit, so it was possible to outline an action directed to the public predisposed to the offer, minimizing expenses reaching greater profitability.

show abstract

Semi-supervised classification trees

Cited by 47 publications

References 23 publications

A survey on semi-supervised learning

A survey on semi-supervised learning

On Ensemble SSL Algorithms for Credit Scoring Problem

Optimization of operational costs of Call centers employing classification techniques

Contact Info

Product

Resources

About