Most existing studies on credit scoring adapted a concept of classifier ensemble for solving an imbalanced dataset. They apply resampling methods to generate multiple training subsets for constructing multiple base classifiers. However, this approach leads to several problems that degrade the classification performance, such as problems of information loss, model overfitting, and computational cost. Thus, we propose a novel ensemble approach for developing a credit scoring model based on a cost-sensitive neural network, called Cost-sensitive Neural Network Ensemble (CS-NNE). In the proposed approach, multiple class weights are adapted to original training data, enabling the multiple base neural networks to consider imbalanced classes. Following this approach, a high diversity of multiple base classifiers without consequent problems can be achieved. The approach's effectiveness is evaluated on five real-world credit datasets. Among them is a loan-requesting dataset provided by a financial institution in Thailand. The remaining datasets are publicly available and widely used by several existing studies. The experimental results showed that the proposed CS-NNE approach improves the predictive performance over a single neural network based on imbalanced credit datasets, e.g., Thai credit dataset, by achieving 1.36%, 15.67%, and 6.11% Area under the ROC Curve (AUC), Default Detection Rate (DDR), and G-Mean (GM), respectively, and achieving the best Misclassification Cost (MC). The proposed CS-NNE approach can effectively solve a class of imbalance problems and outperform many existing models. The prediction model can well compromise between classes of default (bad credit applicants) and non-default (good credit applicants), whereas existing approaches preferred a class of non-default over default loans (having high specificity and low DDR), resulting in NPL.
<span>Several credit-scoring models have been developed using ensemble classifiers in order to improve the accuracy of assessment. However, among the ensemble models, little consideration has been focused on the hyper-parameters tuning of base learners, although these are crucial to constructing ensemble models. This study proposes an improved credit scoring model based on the extreme gradient boosting (XGB) classifier using Bayesian hyper-parameters optimization (XGB-BO). The model comprises two steps. Firstly, data pre-processing is utilized to handle missing values and scale the data. Secondly, Bayesian hyper-parameter optimization is applied to tune the hyper-parameters of the XGB classifier and used to train the model. The model is evaluated on four widely public datasets, i.e., the German, Australia, lending club, and Polish datasets. Several state-of-the-art classification algorithms are implemented for predictive comparison with the proposed method. The results of the proposed model showed promising results, with an improvement in accuracy of 4.10%, 3.03%, and 2.76% on the German, lending club, and Australian datasets, respectively. The proposed model outperformed commonly used techniques, e.g., decision tree, support vector machine, neural network, logistic regression, random forest, and bagging, according to the evaluation results. The experimental results confirmed that the XGB-BO model is suitable for assessing the creditworthiness of applicants.</span>
Market segmentation is an important tool, for driving an organization to achieve its goals. This study proposes a market segmentation technique with the binding of unsupervised and supervised learning techniques. The method aims to cluster international tourists who arrived in Thailand for business proposes, and to classify business tourists by using the products of an unsupervised learning technique as class labels. A Self-Organizing Map (SOM), K-Means and Hierarchical clustering were applied to find the best quality of segmentation guided by the computation of the Silhouette index. Segment labels were used to supervise the learning part as class labels. Multilayer Perceptron (MLP), J48 decision tree, Decision Table, OneR and Naïve Bayes classifiers were used to classify the business tourist data set, and the best performance technique was preferred. The experimental results designated that K-Means outperformed the other clustering techniques and provided five different segments. Moreover, the Naïve Bayes classifier gave the best performance among the other classifiers based on the business tourist variables. Thus, this model can be used to predict the segment of new arrival business tourists.
<span lang="EN-US">In the process of bankruptcy prediction models, a class imbalanced problem has occurred which limits the performance of the models. Most prior research addressed the problem by applying resampling methods such as the synthetic minority oversampling technique (SMOTE). However, resampling methods lead to other issues, e.g., increasing noisy data and training time during the process. To improve the bankruptcy prediction model, we propose cost-sensitive extreme gradient boosting (CS-XGB) to address the class imbalanced problem without requiring any resampling method. The proposed method’s effectiveness is evaluated on six real-world datasets, i.e., the LendingClub, and five Polish companies’ bankruptcy. This research compares the performance of CS-XGB with other ensemble methods, including SMOTE-XGB which applies SMOTE to the training set before the learning process. The experimental results show that i) based on LendingClub, the CS-XGB improves the performance of XGBoost and SMOTE-XGB by more than 50% and 33% on bankruptcy detection rate (BDR) and geometric mean (GM), respectively, and ii) the CS-XGB model outperforms random forest (RF), Bagging, AdaBoost, XGBoost, and SMOTE-XGB in terms of BDR, GM, and the area under a receiver operating characteristic curve (AUC) based on the five Polish datasets. Besides, the CS-XGB model achieves good overall prediction results.</span>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.