AcknowledgementsAs this meticulous journey comes to end, we have come across many well-wishers whose invaluable contribution and assistance made this project successful. Our beloved Professor, Mr. Hossain Arif is one such person. His continuous support and guidance has kept us motivated and enabled us to present our best in all cases. We would also like to thank Mr. Samiul Islam and Dr. Iftekharul Mobin for their continuous encouragement and expertise in this field. Last but certainly not the least; we would like to show our appreciation towards our families without whose unconditional support none of this would have been possible.
AbstractA precise credit risk assessment system is vital to a financial institution for its proper and impeccable functioning. Accurate estimations of credit risk will allow them to continue their operation in a gainful and transparent way. As the rate of loan defaults are gradually increasing, bank authorities are finding it more and more difficult to correctly assess loan requests. Thus the subject of credit risk has become a highly conferred and examined topic throughout the world. Numerous solutions have been given, one being more efficient than the other and several studies are still being made for solving this difficult predicament. Thus keeping the implications of such a problematic matter in mind this paper proposes to build a machine learning model which can precisely assess credit risk and predict possible loan defaulters for any credit lending institution. Taking into account a borrower's financial and social history this paper proposes a way to accurately define whether a customer's loan request should be accepted or not which in turn can steadily save the creditor from incurring further loss. Evaluating data from previous successful borrowers and loan defaulters, a comparative analysis have been made using our supervised learning model and the results obtained can be used to predict the behavior of future borrowers. This model can assist a financial institution in assessing whether it should accept a loan request or not. Different combinations of feature selection algorithm and classifiers have been made and based upon metrics such as accuracy, AUC score, F1 score etc. the best model has been selected. Recursive feature elimination with cross validation (RFECV) and Principal Component Analysis (PCA) have been used to find the optimum number of features needed to make an accurate prediction. This allows us to make more efficient and optimal use of the limited available resources. The assessment will be performed in a supervised environment and so Support Vector Machines (SVM), Random Forest, Extreme Gradient Boosting and Logistic Regression have been used as the classifiers. In order to ensure all possible combinations have been properly tested k folds cross validation has been used to bring out a more balanced result. Furthermore, GridSearchCV has been used to tune the selected hyperparameters for each model in order to obtain the best result possible. And based upon this ...