Financial threats are displaying a trend about the credit risk of commercial banks as the incredible improvement in the financial industry has arisen. In this way, one of the biggest threats faces by commercial banks is the risk prediction of credit clients. Recent studies mostly focus on enhancing the classifier performance for credit card default prediction rather than an interpretable model. In classification problems, an imbalanced dataset is also crucial to improve the performance of the model because most of the cases lied in one class, and only a few examples are in other categories. Traditional statistical approaches are not suitable to deal with imbalanced data. In this study, a model is developed for credit default prediction by employing various credit-related datasets. There is often a significant difference between the minimum and maximum values in different features, so Min-Max normalization is used to scale the features within one range. Data level resampling techniques are employed to overcome the problem of the data imbalance. Various undersampling and oversampling methods are used to resolve the issue of class imbalance. Different machine learning models are also employed to obtain efficient results. We developed the hypothesis of whether developed models using different machine learning techniques are significantly the same or different and whether resampling techniques significantly improves the performance of the proposed models. Oneway Analysis of Variance is a hypothesis-testing technique, used to test the significance of the results. The split method is utilized to validate the results in which data has split into training and test sets. The results on imbalanced datasets show the accuracy of 66.9% on Taiwan clients credit dataset, 70.7% on South German clients credit dataset, and 65% on Belgium clients credit dataset. Conversely, the results using our proposed methods significantly improve the accuracy of 89% on Taiwan clients credit dataset, 84.6% on South German clients credit dataset, and 87.1% on Belgium clients credit dataset. The results show that the performance of classifiers is better on the balanced dataset as compared to the imbalanced dataset. It is also observed that the performance of data oversampling techniques are better than undersampling techniques. Overall, the Gradient Boosted Decision Tree method performs better than other traditional machine learning classifiers. The Gradient Boosted Decision Tree method gives the best results while utilizing the K-means SMOTE oversampling method. Using one-way ANOVA, the null hypothesis was rejected by a p-value <0.001, hence confirming that the proposed model improved performance is statistical significance. The interpretable model is also deployed on the web to ease the different stakeholders. This model will help commercial banks, financial organizations, loan institutes, and other decision-makers to predict the loan defaulter earlier.
In today’s world, lung cancer is a significant health burden, and it is one of the most leading causes of death. A leading type of lung cancer is malignant mesothelioma (MM). Most of the MM patients do not show any symptoms. Etiology plays a vital factor in the diagnosis of any disease. Positron emission tomography (PET), magnetic resonance imaging (MRI), biopsies, X-rays and blood tests are essential but costly and invasive MM risk factor identification methods. In this work, we mainly focused on the exploration of the MM risk factors. The identification of mesothelioma symptoms was carried out by utilizing the data of mesothelioma patients. However, the dataset was comprised of both healthy and mesothelioma patients. The dataset is prone to a class imbalance problem in which the number of MM patients significantly less than healthy individuals. To overcome the class imbalance problem, the synthetic minority oversampling technique has been utilized. The association rule mining-based Apriori algorithm has been applied to a preprocessed dataset. Before using the Apriori algorithm, both duplicate and irrelevant attributes were removed. Moreover, the numerical attributes were also classified into nominal attributes and the association rules were generated in the dataset. Our results show that erythrocyte sedimentation rate, asbestos exposure and its duration time, and pleural and serum lactic dehydrogenase ratio are major risk factors of MM. The severe stages of MM can be avoided by earlier identification of risk factors of the disease. The failure of identification of risk factors can lead to increased risk of multiple medical conditions, including cardiovascular diseases, mental distress, diabetes and anemia.
VANET is the spontaneous evolving creation of a wireless network, and clustering in these networks is a challenging task due to rapidly changing topology and frequent disconnection in networks. The cluster head (CH) stability plays a prominent role in robustness and scalability in the network. The stable CH ensures minimum intra- and intercluster communication, thereby reducing the overhead. These challenges lead the authors to search for a CH selection method based on a weighted amalgamation of four metrics: befit factor, community neighborhood, eccentricity, and trust. The stability of CH depends on the vehicle’s speed, distance, velocity, and change in acceleration. These all are included in the befit factor. Also, the accurate location of the vehicle in changing the model is very vital. Thus, the predicted location with the Kalman filter’s help is used to evaluate CH stability. The results have shown better performance than the existing state of the art for the befit factor. The change in dynamics and frequent disconnection in communication links due to the vehicle’s high speed are inevitable. To comprehend this problem, a graphing approach is used to evaluate the eccentricity and the community neighborhood. The link reliability is calculated using the eigengap heuristic. The last metric is trust; this is one of the concepts that has not been included in the weighted approach to date as per the literature. An adaptive spectrum sensing is designed for evaluating the trust values specifically for the primary users. A deep recurrent learning network, commonly known as long short-term memory (LSTM), is trained for the probability of detection with various signals and noise conditions. The false rate has drastically reduced with the usage of LSTM. The proposed scheme is tested on the real map of Chengdu, southwestern China’s Sichuan province, with different vehicular mobilities. The comparative study with the individual and weighted metric has shown significant improvement in the cluster head stability during high vehicular density. Also, there is a considerable increase in network performance in energy, packet delay, packet delay ratio, and throughput.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.