2023
DOI: 10.1186/s12911-023-02185-5
|View full text |Cite
|
Sign up to set email alerts
|

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

Abstract: Introduction The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models. Methodology … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 46 publications
0
10
0
Order By: Relevance
“…However, in case–control studies with equal sizes, data balancing may not be necessary for ML algorithms [ 32 ]. When using ML algorithms, data balancing is generally important when there is an imbalance between classes, i.e., when one class has significantly fewer observations than the other [ 33 ]. In such cases, balancing can improve the performance of the algorithm by reducing the bias in favor of the majority class [ 34 ].…”
Section: Methodsmentioning
confidence: 99%
“…However, in case–control studies with equal sizes, data balancing may not be necessary for ML algorithms [ 32 ]. When using ML algorithms, data balancing is generally important when there is an imbalance between classes, i.e., when one class has significantly fewer observations than the other [ 33 ]. In such cases, balancing can improve the performance of the algorithm by reducing the bias in favor of the majority class [ 34 ].…”
Section: Methodsmentioning
confidence: 99%
“…Eight machine learning models are chosen in this study, namely, logistic regression (LR), Naïve Bayesian, K-nearest neighbor (KNN), gradient boosted decision tree (GBDT), support vector machine (SVM), random forest (RF), extended gradient boost (XG Boost), and Adaptive boosting (Ada Boost) classifiers. Choice of this set of classifiers is to have a combination of classical models and advanced algorithms to meet the primary requirements of better accuracies and the suitability of classifiers for a limited and an imbalanced dataset [29]. All the eight classifiers are used in 'k' fold cross validation approach, where in the whole data set is split in to 'k' folds and the training and testing are repeated and the overall performance of the classifier is obtained as the average of each fold.…”
Section: Machine Learning (Ml) Classifiersmentioning
confidence: 99%
“…Scikit-learn library of Python open-source software 3.9.10 (Python software Foundation, Delaware, USA) [30] is used for the classification problem. Table 1 lists hyper parameters of some of these classifiers used in this work chosen based on the suggestions in literature to get the best possible classification [29]. Standard evaluation measures of classifiers viz., Sensitivity (Se), Specificity (Sp), Positive predictive value (PPV), Negative predictive value (NPV), Accuracy (Acc) and F1-Score (F1) are computed for each classifier and are compared.…”
Section: Machine Learning (Ml) Classifiersmentioning
confidence: 99%
“…Third, the simple threshold moving method can be applied. Mulugeta et al ( 32 ) used several ML algorithms such as LR, Naïve Bayes, ANN, RF, etc., with the threshold moving technique to predict the risk of graft failure on imbalanced kidney transplant recipients data. The results showed that the data-driven threshold moving technique improved the prediction result from imbalanced data compared to the natural threshold of 0.5.…”
Section: Related Workmentioning
confidence: 99%
“…In the algorithm-level technique, the cost or weight schema is used to mitigate the bias towards the majority class in the underlying classifiers or its output, which is famous as cost-sensitive learning ( 31 ). Compared with data-level techniques, this technique does not require the alteration of the original data distribution as the modified algorithms consider the uneven distribution of classes while training, which results in more accurate performance than data sampling techniques ( 32 ). In addition, a simple and straightforward method named threshold-moving has also shown effective results for the class imbalance problem, which moves the decision threshold in the output to make the high-cost samples harder to misclassify ( 33 , 34 ).…”
Section: Introductionmentioning
confidence: 99%