Background: the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default discrimination, but the model itself also has many shortcomings such as many hyperparameters and large dependence on big data. There is still a lot of room to improve its interpretability and robustness. Methods: the deep forest or multi-Grained Cascade Forest (gcForest) is a decision tree depth model based on the random forest algorithm. Using multidimensional scanning and cascading processing, gcForest can effectively identify and process high-dimensional feature information. At the same time, gcForest has fewer hyperparameters and has strong robustness. So, this paper constructs a two-stage hybrid default discrimination model based on multiple feature selection methods and gcForest algorithm, and at the same time, it optimizes the parameters for the lowest type II error as the first principle, and the highest AUC and accuracy as the second and third principles. GcForest can not only reflect the advantages of traditional statistical models in terms of interpretability and robustness but also take into account the advantages of deep learning models in terms of accuracy. Results: the validity of the hybrid default discrimination model is verified by three real open credit data sets of Australian, Japanese, and German in the UCI database. Conclusion: the performance of the gcForest is better than the current popular single classifiers such as ANN, and the common ensemble classifiers such as LightGBM, and CNNs in type II error, AUC, and accuracy. Besides, in comparison with other similar research results, the robustness and effectiveness of this model are further verified.
Assessing the default of customers is an essential basis for personal credit issuance. This paper considers developing a personal credit default discrimination model based on Super Learner heterogeneous ensemble to improve the accuracy and robustness of default discrimination. First, we select six kinds of single classifiers such as logistic regression, SVM, and three kinds of homogeneous ensemble classifiers such as random forest to build a base classifier candidate library for Super Learner. Then, we use the ten-fold cross-validation method to exercise the base classifier to improve the base classifier’s robustness. We compute the base classifier’s total loss using the difference between the predicted and actual values and establish a base classifier-weighted optimization model to solve for the optimal weight of the base classifier, which minimizes the weighted total loss of all base classifiers. Thus, we obtain the heterogeneous ensembled Super Learner classifier. Finally, we use three real credit datasets in the UCI database regarding Australia, Japanese, and German and the large credit dataset GMSC published by Kaggle platform to test the ensembled Super Learner model’s effectiveness. We also employ four commonly used evaluation indicators, the accuracy rate, type I error rate, type II error rate, and AUC. Compared with the base classifier’s classification results and heterogeneous models such as Stacking and Bstacking, the results show that the ensembled Super Learner model has higher discrimination accuracy and robustness.
Background: The personal credit default discriminant measures the size of the credit default risk, which provides an essential decision-making basis for banks. Methods: This article constructs a three-stage default discriminant model based on the DF21. In the first stage, this article selects the feature combination. This article obtains the default prediction results by traversing the decision tree from 20 to 500 and the learning rate from 0.08 to 0.12 in XGBoost. Taking the lowest Type II error, the highest AUC and accuracy as the first, the second, and the third principles (TAA principle), respectively, this article infers the optimal parameter of decision tree and learning rate reversely and gets the feature importance. This article uses the forward selection method to determine the optimal feature combination according to the TAA principle. In the second stage, this article screens the base classifier for DF21. Considering the applicability of the classifier on different data sets, this article selects the classifier with the good classification performance as the base classifier on each data set. In the third stage, this article constructs the default discriminant model based on DF21. According to the idea that the combination of strong classifiers generates a stronger result, the four strong classifiers are used as the base classifier to improve the cascade structure of DF21. Results: Compared with the first stage, the Type II error (the proportion of the banks’ principal loss) dropped by 4.41%, 5.98%, and 13.00% in the Japanese, Australian, and German, respectively, which proves the effectiveness of DF21. Conclusion: DF21 is significantly better than other classifiers and other scholars’ models according to the TAA principle.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.