Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan, particularly in peer-to-peer lending where class imbalance problems are prevalent. However, few credit risk prediction models for social lending consider imbalanced data and, further, the best resampling technique to use with imbalanced data is still controversial. In an attempt to address these problems, this paper presents an empirical comparison of various combinations of classifiers and resampling techniques within a novel risk assessment methodology that incorporates imbalanced data. The credit predictions from each combination are evaluated with a G-mean measure to avoid bias towards the majority class, which has not been considered in similar studies. The results reveal that combining random forest and random under-sampling may be an effective strategy for calculating the credit risk associated with loan applicants in social lending markets.
As big data analytics is adapted across multitude of domains and applications there is a need for new platforms and architectures that support analytic solution engineering as a lean and iterative process. In this paper we discuss how different software development processes can be adapted to data analytic process engineering, incorporating service oriented architecture, scientific workflows, model driven engineering and semantic technology. Based on the experience obtained through ADAGE framework [1] and the findings of the survey on how semantic modeling is used for data analytic solution engineering [6], we propose two research directions -big data analytic development lifecycle and data analytic knowledge management for lean and flexible data analytic platforms.
As one of the main business models in the financial technology field, peer-to-peer (P2P) lending has disrupted traditional financial services by providing an online platform for lending money that has remarkably reduced financial costs. However, the inherent uncertainty in P2P loans can result in huge financial losses for P2P platforms. Therefore, accurate risk prediction is critical to the success of P2P lending platforms. Indeed, even a small improvement in credit risk prediction would be of benefit to P2P lending platforms. This paper proposes an innovative credit risk prediction framework that fuses base classifiers based on a Choquet fuzzy integral. Choquet integral fusion improves creditworthiness evaluations by synthesizing the prediction results of multiple classifiers and finding the largest consistency between outcomes among conflicting and consistent results. The proposed model was validated through experimental analysis on a realworld dataset from a well-known P2P lending marketplace. The empirical results indicate that the combination of multiple classifiers based on fuzzy Choquet integrals outperforms the best base classifiers used in credit risk prediction to date. In addition, the proposed methodology is superior to some conventional combination techniques.
Keywords-Choquet fuzzy integral; fuzzy measure; credit risk prediction; peer-to-peer lendingI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.