2023
DOI: 10.2196/44081
|View full text |Cite
|
Sign up to set email alerts
|

Issue of Data Imbalance on Low Birthweight Baby Outcomes Prediction and Associated Risk Factors Identification: Establishment of Benchmarking Key Machine Learning Models With Data Rebalancing Strategies

Abstract: Background Low birthweight (LBW) is a leading cause of neonatal mortality in the United States and a major causative factor of adverse health effects in newborns. Identifying high-risk patients early in prenatal care is crucial to preventing adverse outcomes. Previous studies have proposed various machine learning (ML) models for LBW prediction task, but they were limited by small and imbalanced data sets. Some authors attempted to address this through different data rebalancing methods. However, m… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 77 publications
0
5
0
Order By: Relevance
“…It is worth emphasizing that an ensemble training method based on BalanceCascade adopted to handle imbalanced data sets may have certain reference value for some research related to ML. After all, class-imbalance is a common phenomenon in medical research related to disease diagnosis [ 33 , 36 ].…”
Section: Discussionmentioning
confidence: 99%
“…It is worth emphasizing that an ensemble training method based on BalanceCascade adopted to handle imbalanced data sets may have certain reference value for some research related to ML. After all, class-imbalance is a common phenomenon in medical research related to disease diagnosis [ 33 , 36 ].…”
Section: Discussionmentioning
confidence: 99%
“…To solve the imbalance issue of our data set, we used the synthetic minority oversampling technique to balance the training data set. The synthetic minority oversampling technique synthesizes new data from existing data using k-nearest neighbors and inserts them into the original data set [ 21 ]. The data set was randomly divided into 4:1 at a base training set (462,203/577,854, 79.98%) and a base test set (115,651/577,854, 20.01%) with equal distribution of different classes of patient data.…”
Section: Methodsmentioning
confidence: 99%
“…In our study, there exists an imbalance, as the mortality rate is approximately 5%. To address this imbalance, we utilized a weight rebalancing technique to adjust the weights of both the majority and minority classes [ 35 ]. Solely the training dataset underwent balancing.…”
Section: Methodsmentioning
confidence: 99%