Predicting the Depression of the South Korean Elderly using SMOTE and an Imbalanced Binary Dataset

Byeon, Haewon

doi:10.14569/ijacsa.2021.0120110

Cited by 8 publications

(10 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If there is an imbalance of classes, the group with a larger number of data is treated as more important, and the predictive performance decreases. Undersampling, oversampling, and synthetic minority over-sampling technique (SMOTE) methods are mainly used to deal with data imbalance[ 17 ], and it has been reported that the performance of SMOTE is generally better than that of undersampling and oversampling[ 18 ].…”

Section: Types Of Machine Learningmentioning

confidence: 99%

Screening dementia and predicting high dementia risk groups using machine learning

Byeon¹

2022

WJP

Self Cite

View full text Add to dashboard Cite

New technologies such as artificial intelligence, the internet of things, big data, and cloud computing have changed the overall society and economy, and the medical field particularly has tried to combine traditional examination methods and new technologies. The most remarkable field in medical research is the technology of predicting high dementia risk group using big data and artificial intelligence. This review introduces: (1) the definition, main concepts, and classification of machine learning and overall distinction of it from traditional statistical analysis models; and (2) the latest studies in mental science to detect dementia and predict high-risk groups in order to help competent researchers who are challenging medical artificial intelligence in the field of psychiatry. As a result of reviewing 4 studies that used machine learning to discriminate high-risk groups of dementia, various machine learning algorithms such as boosting model, artificial neural network, and random forest were used for predicting dementia. The development of machine learning algorithms will change primary care by applying advanced machine learning algorithms to detect high dementia risk groups in the future.

show abstract

Section: Types Of Machine Learningmentioning

confidence: 99%

Screening dementia and predicting high dementia risk groups using machine learning

Byeon¹

2022

WJP

Self Cite

View full text Add to dashboard Cite

show abstract

“…The third domain believed to affect subjective health is economic activity. Economic activity makes earning and life, in general, more energetic, as the daily routine encourages people to become more diligent [ 39 , 40 , 41 , 42 ]. Previous studies have demonstrated a link between economic activity and subjective health.…”

Section: Review Of Literature and Hypotheses Developmentmentioning

confidence: 99%

Antecedents of Subjective Health among Korean Senior Citizens Using Archival Data

Moon

Woo

Shim

et al. 2022

Behavioral Sciences

View full text Add to dashboard Cite

This study aimed to investigate the determinants of subjective health among South Korean senior citizens. Secondary data for the year 2018 was used from the Senior Citizen Research Panel data collected by the Korea Employment Information Service. A total of 3822 valid observations were analyzed. The dependent variable was subjective health, and the independent variables were religion participation, social gathering participation, economic activity, food expenditure, leisure expenditure, travel frequency, and art watching frequency. Descriptive analysis, correlation matrix, and independent t-test were carried out for data analysis. Multiple linear regression analysis was employed using assets, age, and gender as control variables to test the research hypotheses. The results indicate that all the proposed attributes have a significant positive impact on the subjective health of Korean senior citizens, with implications for policy making.

show abstract

“…The imbalanced data problem exists in many datasets; as a result, classifiers models are biased against the minority class and are unable to predict it accurately [13]. In contrast, most machine learning models perform better when applied with balanced datasets [14,15,16,17].…”

Section: A Data-level Approach and Imbalanced Datamentioning

confidence: 99%

“…Since the sample size grows, the oversampling technique takes longer to construct a model and can cause overfitting because it duplicates samples from a minor class. [23,24]. b) SMOTE: SMOTE is similar to random oversampling.…”

Section: ) Over-sampling Techniquementioning

confidence: 99%

Improving Imbalanced Data Classification in Auto Insurance by the Data Level Approaches

Hanafy¹,

Ming²

2021

IJACSA

View full text Add to dashboard Cite

Predicting the frequency of insurance claims has become a significant challenge due to the imbalanced datasets since the number of occurring claims is usually significantly lower than the number of non-occurring claims. As a result, classification models tend to have a limited ability to predict the occurrence of claims. So, in this paper, we'll use various data level approaches to try to solve the imbalanced data problem in the insurance industry. We developed 32 machine learning models for predicting insurance claims occurrence {(undersampling, over-sampling, the combination of over-and undersampling (hybrid), and SMOTE)× (three Decision tree models, three boosting models, and two bagging models) = 32}, and we compared the models' accuracies, sensitivities, and specificities to comprehend the prediction performance of the built models. The dataset contains 81628 claims, each of which is a car insurance claim. There were 5714 claims that occurred and 75914 claims that didn't occur. According to the findings, the AdaBoost classifier with oversampling and the hybrid method had the most accurate predictions, with a sensitivity of 92.94%, a specificity of 99.82%, and an accuracy of 99.4%. And with a sensitivity of 92.48%, a specificity of 99.63%, and an accuracy of 99.1%, respectively. This paper confirmed that When analyzing imbalanced data, the AdaBoost classifier, whether using oversampling or the hybrid process, could generate more accurate models than other boosting models, Decision tree models, and bagging models.

show abstract

Predicting the Depression of the South Korean Elderly using SMOTE and an Imbalanced Binary Dataset

Cited by 8 publications

References 35 publications

Screening dementia and predicting high dementia risk groups using machine learning

Screening dementia and predicting high dementia risk groups using machine learning

Antecedents of Subjective Health among Korean Senior Citizens Using Archival Data

Improving Imbalanced Data Classification in Auto Insurance by the Data Level Approaches

Contact Info

Product

Resources

About