Credit scoring of financially excluded persons is challenging for financial institutions because of a lack of financial data and long physical distances, which hamper data collection. The remote collection of alternative data has the potential to overcome these challenges, enabling credit access for such individuals. Whereas alternative data sources such as mobile phones have been investigated by previous researchers, this research proposes the integration of mobile-phone, satellite, and public geospatial data to improve credit evaluations where financial data are lacking. An approach to integrating these disparate data sources involving both spatial and temporal analysis methods such as spatial aggregation was employed, resulting in various data combinations. The resulting data sets were used to train classifiers of varying complexity, from logistic regression to ensemble learning. Comparisons were based on various performance metrics, including accuracy and the area under the receiver operating-characteristic curve. The combination of all three data sources performed significantly better than mobile-phone data, with the mean classifier accuracy and F1 score improving by 18% and 0.149, respectively. It is shown how these improvements can translate to cost savings for financial institutions through a reduction in misclassification errors. Alternative data combined in this manner could enhance credit provision to financially excluded persons while managing associated risks, leading to greater financial inclusion.
Feature selection is crucial to the credit-scoring process, allowing for the removal of irrelevant variables with low predictive power. Conventional credit-scoring techniques treat this as a separate process wherein features are selected based on improving a single statistical measure, such as accuracy; however, recent research has focused on meaningful business parameters such as profit. More than one factor may be important to the selection process, making multi-objective optimization methods a necessity. However, the comparative performance of multi-objective methods has been known to vary depending on the test problem and specific implementation. This research employed a recent hybrid non-dominated sorting binary Grasshopper Optimization Algorithm and compared its performance on multi-objective feature selection for credit scoring to that of two popular benchmark algorithms in this space. Further comparison is made to determine the impact of changing the profit-maximizing base classifiers on algorithm performance. Experiments demonstrate that, of the base classifiers used, the neural network classifier improved the profit-based measure and minimized the mean number of features in the population the most. Additionally, the NSBGOA algorithm gave relatively smaller hypervolumes and increased computational time across all base classifiers, while giving the highest mean objective values for the solutions. It is clear that the base classifier has a significant impact on the results of multi-objective optimization. Therefore, careful consideration should be made of the base classifier to use in the scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.