We compare the performances of a wide set of regression techniques and machine learning algorithms for predicting recovery rates on non-performing loans, using a private database from a European debt collection agency. We find that rule-based algorithms such as Cubist, boosted trees and random forests perform significantly better than other approaches. In addition to loan contract specificities, the predictors referring to the bank recovery process-prior to the portfolio's sale to the debt collector-are also proven to strongly enhance forecasting performances. These variables, derived from the time-series of contacts to defaulted clients and clients' reimbursements to the bank, help all algorithms to better identify debtors with different repayment ability and/or commitment, and in general with different recovery potential.
Combining forecasts formed by various models can substantially improve the prediction performances compared to those obtained from the individual models. Standard combination approaches consist in a forecast selection step followed by a weighting scheme. It is not clear, however, which models to include, and how to combine them. This is a central question, having a substantial impact on the quality of the aggregate forecast. We propose a robust method that mitigates estimation uncertainty and implicitly features forecast selection. Our approach relies on constrained optimization with penalty (COP). We take advantage of the equivalence existing between COP and constrained optimization with shrinkage of the prediction errors' covariance matrix (COS) to determine the optimal L2 penalty, thereby making the economoy of an expensive (and potentially harmful) crossvalidation stage. Our method is tested empirically in a simulation exercise and on two applications in economics. The proposed combination schemes outperform the simple average forecast, trimmed simple average forecast and perform at least as well as the best individual model(s) in the considered cases.
While previous academic research highlights the potential of machine learning and big data for predicting corporate bond recovery rates, the operations management challenge is to identify the relevant predictive variables and the appropriate model. In this paper, we use meta-learning to combine the predictions from 20 candidates of linear, nonlinear and rule-based algorithms, and we exploit a data set of predictors including security-specific factors, macro-financial indicators and measures of economic uncertainty. We find that the most promising approach consists of model combinations trained on security-specific characteristics and a limited number of well-identified, theoretically sound recovery rate determinants, including uncertainty measures. Our research provides useful indications for practitioners and regulators targeting more reliable risk measures in designing micro- and macro-prudential policies.
We compare the performances of a wide set of regression techniques and machine learning algorithms for predicting recovery rates on non-performing loans, using a private database from a European debt collection agency. We find that rule-based algorithms such as Cubist, boosted trees and random forests perform significantly better than other approaches. In addition to loan contract specificities, the predictors referring to the bank recovery process -prior to the portfolio's sale to the debt collector -are also proven to strongly enhance forecasting performances. These variables, derived from the time-series of contacts to defaulted clients and clients' reimbursements to the bank, help all algorithms to better identify debtors with different repayment ability and/or commitment, and in general with different recovery potential.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.