We analyze the performance of a set of machine learning (ML) models in predicting default risk, using standard statistical models, such as the logistic regression, as a benchmark. When only a limited information set is available, for example in the case of financial indicators, we find that ML models provide substantial gains in discriminatory power and precision compared with statistical models. This advantage diminishes when high quality information, such as credit behavioral indicators obtained from the Credit Register, is also available, and becomes negligible when the dataset is small. We also evaluate the consequences of using an ML-based rating system on the supply of credit and the number of borrowers gaining access to credit. ML models channel a larger share of credit towards safer and larger borrowers and result in lower credit losses for lenders.
We compare statistical models usually employed in credit risk forecasting with machine learning algorithms (ML). Using a large dataset which includes financial ratios and credit behavioral indicators for about 300,000 Italian non-financial firms from 2011 to 2017, we show that training the models on financial statement data only, ML models record a significant improvement in discriminatory power and precision with respect to statistical models; however, this improvement is less pronounced when we enlarge the training dataset to include also credit behavioral data
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.