This paper applies the Random Forest (RF) method for the robust modelling of credit default prediction. This technique has been proven as an efficient classifier and can provide better interpretability in comparison to other classifiers. Using Chines micro-enterprise credit data set, this study emphasizes the multidimensional analysis of credit risk, such as the whole sample, subsample, and the incremental effect of the group of predictors. To that end, relative variable importance (RVIs) has been presented for all predictors according to the contribution in the prediction accuracy so that to ensure interpretability of the model. The empirical findings confirm that RF technique is reliable and efficient across all of the criteria used in this study. In addition, the examined experimental analysis indicates that non-traditional variables have a significant effect on the classification accuracy. Thus, this paper recommends some alternative predictors like the legal representative's basic information, internal non-financial factors, along with traditional financial variables for sustainable model development. The performance is compared from the perspective of five different performance measures. This modelling algorithm can be used by different financial markets participants to measure systematically credit default prediction of individual and institutional customers.
This paper aims to discover a suitable combination of contemporary feature selection techniques and robust prediction classifiers. As such, to examine the impact of the feature selection method on classifier performance, we use two Chinese and three other real-world credit scoring datasets. The utilized feature selection methods are the least absolute shrinkage and selection operator (LASSO), multivariate adaptive regression splines (MARS). In contrast, the examined classifiers are the classification and regression trees (CART), logistic regression (LR), artificial neural network (ANN), and support vector machines (SVM). Empirical findings confirm that LASSO's feature selection method, followed by robust classifier SVM, demonstrates remarkable improvement and outperforms other competitive classifiers. Moreover, ANN also offers improved accuracy with feature selection methods; LR only can improve classification efficiency through performing feature selection via LASSO. Nonetheless, CART does not provide any indication of improvement in any combination. The proposed credit scoring modeling strategy may use to develop policy, progressive ideas, operational guidelines for effective credit risk management of lending, and other financial institutions. The finding of this study has practical value, as to date, there is no consensus about the combination of feature selection method and prediction classifiers.
This paper examines the impact of hybridizations on the classification performances of sophisticated machine learning classifiers such as gradient boosting (GB, TreeNet ® ) and random forest (RF) using multi-stage hybrid models. The empirical findings confirm that, overall, hybrid model GB (X* Di ; Ŷ Di, LR ), which consists of TreeNet ® combined with logistic regression along with a new dependent variable, offers significantly superior accuracy compared to the baselines and other hybrid classifiers. However, the performances of hybrid classifiers are not consistent across all types of datasets. For low-dimensional data, the constructed models consistently outperform the base classifiers; however, on high dimensional data, the classification outcomes provide little evidence of improvement and in certain cases, they underperform the baseline models. These findings have relevance for the analysis of high-and lowdimensional credit risk, small and medium enterprises, agricultural credits, and so on. Furthermore, the example credit risk scenario and its outcomes provide an alternative path for hybrid and machine learning approaches to be applied to more general applications in accounting and finance fields.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.