Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

Mulugeta, Getahun; Zewotir, Temesgen; Tegegne, Awoke Seyoum; Juhar, Leja Hamza; Muleta, Mahteme Bekele

doi:10.1186/s12911-023-02185-5

Cited by 10 publications

(10 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, in case–control studies with equal sizes, data balancing may not be necessary for ML algorithms [ 32 ]. When using ML algorithms, data balancing is generally important when there is an imbalance between classes, i.e., when one class has significantly fewer observations than the other [ 33 ]. In such cases, balancing can improve the performance of the algorithm by reducing the bias in favor of the majority class [ 34 ].…”

Section: Methodsmentioning

confidence: 99%

The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case–control study

Seyedtabib,

Najafi-Vosough,

Kamyari

2024

BMC Infect Dis

View full text Add to dashboard Cite

Background and purpose The COVID-19 pandemic has presented unprecedented public health challenges worldwide. Understanding the factors contributing to COVID-19 mortality is critical for effective management and intervention strategies. This study aims to unlock the predictive power of data collected from personal, clinical, preclinical, and laboratory variables through machine learning (ML) analyses. Methods A retrospective study was conducted in 2022 in a large hospital in Abadan, Iran. Data were collected and categorized into demographic, clinical, comorbid, treatment, initial vital signs, symptoms, and laboratory test groups. The collected data were subjected to ML analysis to identify predictive factors associated with COVID-19 mortality. Five algorithms were used to analyze the data set and derive the latent predictive power of the variables by the shapely additive explanation values. Results Results highlight key factors associated with COVID-19 mortality, including age, comorbidities (hypertension, diabetes), specific treatments (antibiotics, remdesivir, favipiravir, vitamin zinc), and clinical indicators (heart rate, respiratory rate, temperature). Notably, specific symptoms (productive cough, dyspnea, delirium) and laboratory values (D-dimer, ESR) also play a critical role in predicting outcomes. This study highlights the importance of feature selection and the impact of data quantity and quality on model performance. Conclusion This study highlights the potential of ML analysis to improve the accuracy of COVID-19 mortality prediction and emphasizes the need for a comprehensive approach that considers multiple feature categories. It highlights the critical role of data quality and quantity in improving model performance and contributes to our understanding of the multifaceted factors that influence COVID-19 outcomes.

show abstract

Section: Methodsmentioning

confidence: 99%

The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case–control study

Seyedtabib,

Najafi-Vosough,

Kamyari

2024

BMC Infect Dis

View full text Add to dashboard Cite

show abstract

“…Eight machine learning models are chosen in this study, namely, logistic regression (LR), Naïve Bayesian, K-nearest neighbor (KNN), gradient boosted decision tree (GBDT), support vector machine (SVM), random forest (RF), extended gradient boost (XG Boost), and Adaptive boosting (Ada Boost) classifiers. Choice of this set of classifiers is to have a combination of classical models and advanced algorithms to meet the primary requirements of better accuracies and the suitability of classifiers for a limited and an imbalanced dataset [29]. All the eight classifiers are used in 'k' fold cross validation approach, where in the whole data set is split in to 'k' folds and the training and testing are repeated and the overall performance of the classifier is obtained as the average of each fold.…”

Section: Machine Learning (Ml) Classifiersmentioning

confidence: 99%

“…Scikit-learn library of Python open-source software 3.9.10 (Python software Foundation, Delaware, USA) [30] is used for the classification problem. Table 1 lists hyper parameters of some of these classifiers used in this work chosen based on the suggestions in literature to get the best possible classification [29]. Standard evaluation measures of classifiers viz., Sensitivity (Se), Specificity (Sp), Positive predictive value (PPV), Negative predictive value (NPV), Accuracy (Acc) and F1-Score (F1) are computed for each classifier and are compared.…”

Section: Machine Learning (Ml) Classifiersmentioning

confidence: 99%

The role of beat-by-beat cardiac features in machine learning classification of ischemic heart disease (IHD) in magnetocardiogram (MCG)

Senthilnathan,

Shenbaga Devi,

Sasikala

et al. 2024

Biomed. Phys. Eng. Express

View full text Add to dashboard Cite

Cardiac electrical changes associated with ischemic heart disease (IHD) are subtle and could be detected even in rest condition in magnetocardiography (MCG) which measures weak cardiac magnetic fields. Cardiac features that are derived from MCG recorded from multiple locations on the chest of subjects and some conventional time domain indices are widely used in Machine learning (ML) classifiers to objectively distinguish IHD and control subjects. Most of the earlier studies have employed features that are derived from signal-averaged cardiac beats and have ignored inter-beat information. The present study demonstrates the utility of beat-by-beat features to be useful in classifying IHD subjects (n=23) and healthy controls (n=75) in 37-channel MCG data taken under rest condition of subjects. The study reveals the importance of three features (out of eight measured features) namely, the field map angle (FMA) computed from magnetic field map, beat-by-beat variations of alpha angle in the ST-T region and T wave magnitude variations in yielding a better classification accuracy (92.7 %) against that achieved by conventional features (81 %). Further, beat-by-beat features are also found to augment the accuracy in classifying myocardial infarction (MI) Vs. control subjects in two public ECG databases (92 % from 88 % and 94 % from 77 %). These demonstrations summarily suggest the importance of beat-by-beat features in clinical diagnosis of ischemia.

show abstract

“…Third, the simple threshold moving method can be applied. Mulugeta et al ( 32 ) used several ML algorithms such as LR, Naïve Bayes, ANN, RF, etc., with the threshold moving technique to predict the risk of graft failure on imbalanced kidney transplant recipients data. The results showed that the data-driven threshold moving technique improved the prediction result from imbalanced data compared to the natural threshold of 0.5.…”

Section: Related Workmentioning

confidence: 99%

“…In the algorithm-level technique, the cost or weight schema is used to mitigate the bias towards the majority class in the underlying classifiers or its output, which is famous as cost-sensitive learning ( 31 ). Compared with data-level techniques, this technique does not require the alteration of the original data distribution as the modified algorithms consider the uneven distribution of classes while training, which results in more accurate performance than data sampling techniques ( 32 ). In addition, a simple and straightforward method named threshold-moving has also shown effective results for the class imbalance problem, which moves the decision threshold in the output to make the high-cost samples harder to misclassify ( 33 , 34 ).…”

Section: Introductionmentioning

confidence: 99%

A cost-sensitive deep neural network-based prediction model for the mortality in acute myocardial infarction patients with hypertension on imbalanced data

Zheng,

Sherazi,

Lee

2024

Front. Cardiovasc. Med.

View full text Add to dashboard Cite

Background and objectivesHypertension is one of the most serious risk factors and the leading cause of mortality in patients with cardiovascular diseases (CVDs). It is necessary to accurately predict the mortality of patients suffering from CVDs with hypertension. Therefore, this paper proposes a novel cost-sensitive deep neural network (CSDNN)-based mortality prediction model for out-of-hospital acute myocardial infarction (AMI) patients with hypertension on imbalanced data.MethodsThe synopsis of our research is as follows. First, the experimental data is extracted from the Korea Acute Myocardial Infarction Registry-National Institutes of Health (KAMIR-NIH) and preprocessed with several approaches. Then the imbalanced experimental dataset is divided into training data (80%) and test data (20%). After that, we design the proposed CSDNN-based mortality prediction model, which can solve the skewed class distribution between the majority and minority classes in the training data. The threshold moving technique is also employed to enhance the performance of the proposed model. Finally, we evaluate the performance of the proposed model using the test data and compare it with other commonly used machine learning (ML) and data sampling-based ensemble models. Moreover, the hyperparameters of all models are optimized through random search strategies with a 5-fold cross-validation approach.Results and discussionIn the result, the proposed CSDNN model with the threshold moving technique yielded the best results on imbalanced data. Additionally, our proposed model outperformed the best ML model and the classic data sampling-based ensemble model with an AUC of 2.58% and 2.55% improvement, respectively. It aids in decision-making and offers a precise mortality prediction for AMI patients with hypertension.

show abstract

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

Cited by 10 publications

References 46 publications

The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case–control study

The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case–control study

The role of beat-by-beat cardiac features in machine learning classification of ischemic heart disease (IHD) in magnetocardiogram (MCG)

A cost-sensitive deep neural network-based prediction model for the mortality in acute myocardial infarction patients with hypertension on imbalanced data

Contact Info

Product

Resources

About