Chronic kidney disease (CKD) is a worldwide public health problem, usually diagnosed in the late stages of the disease. To alleviate such issue, investment in early prediction is necessary. The purpose of this study is to assist the early prediction of CKD, addressing problems related to imbalanced and limited-size datasets. We used data from medical records of Brazilians with or without a diagnosis of CKD, containing the following attributes: hypertension, diabetes mellitus, creatinine, urea, albuminuria, age, gender, and glomerular filtration rate. We present an oversampling approach based on manual and automated augmentation. We experimented with the synthetic minority oversampling technique (SMOTE), Borderline-SMOTE, and Borderline-SMOTE SVM. We implemented models based on the algorithms: decision tree (DT), random forest, and multi-class AdaBoosted DTs. We also applied the overall local accuracy and local class accuracy methods for dynamic classifier selection; and the k-nearest oracles-union, k-nearest oracles-eliminate, and META-DES for dynamic ensemble selection. We analyzed the models’ performances using the hold-out validation, multiple stratified cross-validation (CV), and nested CV. The DT model presented the highest accuracy score (98.99%) using the manual augmentation and SMOTE. Our approach can assist in designing systems for the early prediction of CKD using imbalanced and limited-size datasets.
One factor that impacts the quality of Brazilian education is the quality of books and other didactic materials freely distributed throughout the country to public schools, thanks to the Brazilian National Textbook Program. The current evaluation process may take at least two years to complete, involving hundreds of people, and the final result may impact the entire educational system. One of the first activities of the process is to validate and triage the editorial quality attributes of textbooks. However, the validation and triage process needs improvement, considering the gradual expansion of the quantity and variety of materials that currently affect it. This generates risks of reduced quality and timely deliveries. This paper provides a comprehensive critical analysis of the validation and triage process based on the Policy Design Arc framework of Harvard’s Kennedy School of Government. We identified causes that affect the quality of deliveries and the time required to conclude tasks. We also propose a theory of change for digital transformation, defining strategies to address the causes of problems, outputs, outcomes, and impacts. Therefore, we have gradually implemented our theory of change in the validation and triage process.
BACKGROUND Chronic kidney disease (CKD) is a worldwide public health problem, usually diagnosed in the late stages of the disease, increasing public health costs and mortality rates. The late diagnosis is even more critical in low- and middle-income countries due to the high poverty levels, many hard-to-reach locations, and sometimes lack/precarious primary care. Therefore, to alleviate these issues, investment in early prediction is necessary. OBJECTIVE The purpose of this study is to assist the early prediction of CKD, addressing problems related to imbalanced and limited-size data sets. METHODS To address our multi-class problem (low risk, medium risk, high risk, and very high risk), we used data from medical records of 60 Brazilians with or without a diagnosis of CKD, containing the following attributes: hypertension, diabetes mellitus, creatinine, urea, albuminuria, age, gender, and glomerular filtration rate. We used two approaches for oversampling: (1) manual augmentation with data validated by an experienced nephrologist and (2) automated augmentation with the synthetic minority oversampling technique (SMOTE), borderline-SMOTE, and Borderline-SMOTE support vector machine. We implemented classification models based on such data sets and the algorithms: decision tree (DT), random forest, and multi-class AdaBoosted DTs. We also applied the overall local accuracy and local class accuracy methods for dynamic classifier selection; and the k-nearest oracles-union, k-nearest oracles-eliminate, and META-DES for dynamic ensemble selection. We analyzed the models' performances using the hold-out validation, multiple stratified cross-validation (CV), and nested CV. We also computed the importance of features using feature selection methods. RESULTS The best performance was achieved using the DT and multi-class AdaBoosted DTs classification models, oversampled with SMOTE, and validated with the multiple stratified CV and nested CV methods. The DT model presented the highest accuracy score (98.99%) for both multiple stratified CV and nested CV, followed by multi-class AdaBoosted DTs (97.99% and 98.00%), respectively. CONCLUSIONS The SMOTE and multiple stratified CV or nested CV methods provided reliable results for such an imbalanced and limited size data set. During CKD monitoring, based on the DT model, assuming the previous DM evaluation, the user only needs to perform two blood tests: creatinine and urea. Thus, the DT model can assist in designing systems for the early prediction of CKD, providing easy interpretation and cost reduction.
A alta incidência e prevalência de Doença Renal Crônica (DRC), frequentemente causada por diagnósticos tardios, é um problema crítico de saúde pública. Análises comparativas qualitativas e quantitativas foram realizadas usando uma revisão sistemática da literatura e um experimento com técnicas de aprendizado de máquina, respectivamente. A árvore de decisão J48, com 95,00% de acurácia, foi usada para desenvolver um sistema inteligente para avaliar o risco de DRC. Além disso, quando o paciente com DRC está fora de seu município e ocorre uma emergência, o sistema recomenda que o paciente compareça a uma unidade de saúde apropriada, dependendo da situação clínica, para evitar cuidados de saúde tardios ou inadequados.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.