Improving Risk Identification of Adverse Outcomes in Chronic Heart Failure Using SMOTE+ENN and Machine Learning

Wang, Ke; Tian, Jing; Zheng, Chen; Yang, Hong; Ren, Jia; Li, Chenhao; Han, Qinghua; Zhang, Yanbo

doi:10.2147/rmhp.s310295

Cited by 32 publications

(16 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Synthetic Minority Over-Sampling Technique (SMOTE) is an oversampling technique that is an effective algorithm for handling imbalances between data classes ( 16 ). It uses k-neighbour synthesis to amplify minority classes to obtain a balanced data set ( 17 ) that exhibits good performance in areas such as network intrusion detection systems and disease detection. In this study, there is a serious imbalance in the response variables, ACR outcomes and MCR outcomes ( Figures 2A , B ).…”

Section: Participants and Methodsmentioning

confidence: 99%

Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province

et al. 2023

View full text Add to dashboard Cite

IntroductionChronic kidney disease (CKD) is a progressive disease with high incidence but early imperceptible symptoms. Since China’s rural areas are subject to inadequate medical check-ups and single disease screening programme, it could easily translate into end-stage renal failure. This study aimed to construct an early warning model for CKD tailored to impoverished areas by employing machine learning (ML) algorithms with easily accessible parameters from ten rural areas in Shanxi Province, thereby, promoting a forward shift of treatment time and improving patients’ quality of life.MethodsFrom April to November 2019, CKD opportunistic screening was carried out in 10 rural areas in Shanxi Province. First, general information, physical examination data, blood and urine specimens were collected from 13,550 subjects. Afterward, feature selection of explanatory variables was performed using LASSO regression, and target datasets were balanced using the SMOTE (synthetic minority over-sampling technique) algorithm, i.e., albuminuria-to-creatinine ratio (ACR) and α1-microglobulin-to-creatinine ratio (MCR). Next, Bagging, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were employed for classification of ACR outcomes and MCR outcomes, respectively.Results12,330 rural residents were included in this study, with 20 explanatory variables. The cases with increased ACR and increased MCR represented 1,587 (12.8%) and 1,456 (11.8%), respectively. After conducting LASSO, 14 and 15 explanatory variables remained in these two datasets, respectively. Bagging, RF, and XGBoost performed well in classification, with the AUC reaching 0.74, 0.87, 0.87, 0.89 for ACR outcomes and 0.75, 0.88, 0.89, 0.90 for MCR outcomes. The five variables contributing most to the classification of ACR outcomes and MCR outcomes constituted SBP, TG, TC, and Hcy, DBP and age, TG, SBP, Hcy and FPG, respectively. Overall, the machine learning algorithms could emerge as a warning model for CKD.ConclusionML algorithms in conjunction with rural accessible indexes boast good performance in classification, which allows for an early warning model for CKD. This model could help achieve large-scale population screening for CKD in poverty-stricken areas and should be promoted to improve the quality of life and reduce the mortality rate.

show abstract

Section: Participants and Methodsmentioning

confidence: 99%

Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province

et al. 2023

View full text Add to dashboard Cite

show abstract

“…The Synthetic Minority Over-Sampling Technique (SMOTE) is an oversampling technique that is an effective algorithm for dealing with imbalances between data classes ( 21 ). It’s employed to synthetically enlarge the minority class using K-nearest neighors to obtain a balanced data set ( 22 ) and has been shown good performance in such fields as network intrusion detection systems and disease detection. In this study, there is a serious imbalance in the response variables, GI and TI ( Figures 1A,B ).…”

Section: Participants and Methodsmentioning

confidence: 99%

Using random forest algorithm for glomerular and tubular injury diagnosis

Song

Zhou

Qi³

et al. 2022

Front. Med.

View full text Add to dashboard Cite

ObjectivesChronic kidney disease (CKD) is a common chronic condition with high incidence and insidious onset. Glomerular injury (GI) and tubular injury (TI) represent early manifestations of CKD and could indicate the risk of its development. In this study, we aimed to classify GI and TI using three machine learning algorithms to promote their early diagnosis and slow the progression of CKD.MethodsDemographic information, physical examination, blood, and morning urine samples were first collected from 13,550 subjects in 10 counties in Shanxi province for classification of GI and TI. Besides, LASSO regression was employed for feature selection of explanatory variables, and the SMOTE (synthetic minority over-sampling technique) algorithm was used to balance target datasets, i.e., GI and TI. Afterward, Random Forest (RF), Naive Bayes (NB), and logistic regression (LR) were constructed to achieve classification of GI and TI, respectively.ResultsA total of 12,330 participants enrolled in this study, with 20 explanatory variables. The number of patients with GI, and TI were 1,587 (12.8%) and 1,456 (11.8%), respectively. After feature selection by LASSO, 14 and 15 explanatory variables remained in these two datasets. Besides, after SMOTE, the number of patients and normal ones were 6,165, 6,165 for GI, and 6,165, 6,164 for TI, respectively. RF outperformed NB and LR in terms of accuracy (78.14, 80.49%), sensitivity (82.00, 84.60%), specificity (74.29, 76.09%), and AUC (0.868, 0.885) for both GI and TI; the four variables contributing most to the classification of GI and TI represented SBP, DBP, sex, age and age, SBP, FPG, and GHb, respectively.ConclusionRF boasts good performance in classifying GI and TI, which allows for early auxiliary diagnosis of GI and TI, thus facilitating to help alleviate the progression of CKD, and enjoying great prospects in clinical practice.

show abstract

“…To address imbalances among classes within a dataset, we also employed a combination of both undersampling and oversampling, SMOTEENN (Supplemental Appendix 5). 24 Since an imbalance in classes can have a considerable impact on the performance of a classifier, 25 the training set should be balanced. 26 We applied SMOTEENN to 80% of the records used for training.…”

Section: Model Developmentmentioning

confidence: 99%

Machine learning models to detect social distress, spiritual pain, and severe physical psychological symptoms in terminally ill patients with cancer from unstructured text data in electronic medical records

et al. 2022

View full text Add to dashboard Cite

Background: Few studies have developed automatic systems for identifying social distress, spiritual pain, and severe physical and phycological symptoms from text data in electronic medical records. Aim: To develop models to detect social distress, spiritual pain, and severe physical and psychological symptoms in terminally ill patients with cancer from unstructured text data contained in electronic medical records. Design: A retrospective study of 1,554,736 narrative clinical records was analyzed 1 month before patients died. Supervised machine learning models were trained to detect comprehensive symptoms, and the performance of the models was tested using the area under the receiver operating characteristic curve (AUROC) and precision recall curve (AUPRC). Setting/participants: A total of 808 patients was included in the study using records obtained from a university hospital in Japan between January 1, 2018 and December 31, 2019. As training data, we used medical records labeled for detecting social distress ( n = 10,000) and spiritual pain ( n = 10,000), and records that could be combined with the Support Team Assessment Schedule (based on date) for detecting severe physical/psychological symptoms ( n = 5409). Results: Machine learning models for detecting social distress had AUROC and AUPRC values of 0.98 and 0.61, respectively; values for spiritual pain, were 0.90 and 0.58, respectively. The machine learning models accurately identified severe symptoms (pain, dyspnea, nausea, insomnia, and anxiety) with a high level of discrimination (AUROC > 0.8). Conclusion: The machine learning models could detect social distress, spiritual pain, and severe symptoms in terminally ill patients with cancer from text data contained in electronic medical records.

show abstract

Improving Risk Identification of Adverse Outcomes in Chronic Heart Failure Using SMOTE+ENN and Machine Learning

Cited by 32 publications

References 41 publications

Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province

Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province

Using random forest algorithm for glomerular and tubular injury diagnosis

Machine learning models to detect social distress, spiritual pain, and severe physical psychological symptoms in terminally ill patients with cancer from unstructured text data in electronic medical records

Contact Info

Product

Resources

About