2018
DOI: 10.1108/jm2-01-2017-0002
|View full text |Cite
|
Sign up to set email alerts
|

Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines

Abstract: Purpose Credit scoring datasets are generally unbalanced. The number of repaid loans is higher than that of defaulted ones. Therefore, the classification of these data is biased toward the majority class, which practically means that it tends to attribute a mistaken “good borrower” status even to “very risky borrowers”. In addition to the use of statistics and machine learning classifiers, this paper aims to explore the relevance and performance of sampling models combined with statistical prediction and artif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
37
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 56 publications
(39 citation statements)
references
References 53 publications
1
37
0
1
Order By: Relevance
“…This model compared with other linear models showed better performance in terms of prediction accuracy due to the reducing the influence of irrelevant features. In (Khemakhem et al 2018), authors assessed credit risk using linear regression, SVM and neural networks. Their work compares performance indicators of the prediction methods before and after data balancing.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This model compared with other linear models showed better performance in terms of prediction accuracy due to the reducing the influence of irrelevant features. In (Khemakhem et al 2018), authors assessed credit risk using linear regression, SVM and neural networks. Their work compares performance indicators of the prediction methods before and after data balancing.…”
Section: Related Workmentioning
confidence: 99%
“…Thus, in our research, we are using a unique dataset of the credit registry, and we present all the necessary steps from data collection to prediction and evaluation. On this dataset, we train models using and comparing the most used machine-learning algorithms; additionally, we consider sampling strategy for data balancing, such as the approach in (Khemakhem et al 2018). Even though the dataset that we exploit in this paper gives an added value to this research and its results, the main drawback is that we did not compare the results with any other similar dataset, because it is impossible to obtain such datasets from neighboring or any other central bank.…”
Section: Related Workmentioning
confidence: 99%
“…(Some examples of images eliminated from the dataset) Hâlbuki derin öğrenme algoritmalarının uygulanabilmesi için çok sayıda anlamlı ve etiketli veriye gereksinim duyulmaktadır [41]. Ayrıca tercih edilen veri seti üzerinde, daha iyi bir sınıflandırma yapılabilmesi için veri sayısı az olan sınıflara ait veri sayısının arttırılması gerekmektedir [42][43][44]. Bu nedenle dengeli sınıf dağılımının sağlanabilmesi için veri arttırma ve veri boyutunun standart hale getirilmesine ihtiyaç duyulmaktadır.…”
Section: şEkil 3 Veri Setinden Elimine Edilen Bazı Görüntü öRnekleriunclassified
“…To tackle an imbalanced problem in credit scoring data, many studies employed resampling techniques, such as under-sampling and over-sampling [3][4][5][6][7][8]. The major disadvantage of resampling techniques is led to the overhead cost and the other consequent problems, e.g., 1) information may be lost using under-sampling techniques, 2) the final model may be overfitted using over-sampling techniques, 3) the original data distribution may be changed, and 4) the model is more complex and it has high computational cost.…”
Section: Introductionmentioning
confidence: 99%