Background and Aims: The birth weight of a newborn is a crucial factor that affects their overall health and future well-being. Low birth weight (LBW) is a widespread global issue, which the World Health Organization defines as weighing less than 2,500 grams. LBW can have severe negative consequences on an individual's health, including neonatal mortality and various health concerns throughout their life. To address this problem, this study has been conducted using BDHS 2017-2018 data to identify the most relevant features of LBW and the best model.
Methods: The study used data extracted from BDHS 2017-2018. The Boruta algorithm and Wrapper method were employed to identify essential features. Several machine learning classifiers were then used, including Decision Tree, SVM (Support Vector Machine), Naïve Bayes, Random Forest, XGBoost (eXtreme Gradient Boosting), and AdaBoost (Adaptive Boosting), to determine the best model for predicting LBW.
Results: According to the study, the Boruta algorithm identifies eleven significant features, which include the respondent's age, highest education level, educational attainment, wealth index, age at first birth, weight, height, BMI, age at first sexual intercourse, birth order number, and whether the child is a twin. The machine-learning classifiers used in the study were DT, SVM, NB, RF, XGBoost, and AB models, which had AUCs of 0.538, 0.532, 0.529, 0.549, 0.507, and 0.509, respectively. The DT and RF models, implemented using the wrapper method, identified three significant features: the region, whether the child is a twin, and delivery by cesarean section, with AUC values of 0.5550 and 0.5749, respectively. On the other hand, the SVM, NB, and AB models identified "child is twin" as a significant feature with an AUC value of 0.5120. Finally, the XGBoost model identified "age at 1st sex" and "child is twin" as significant features with an AUC value of 0.508.
Conclusions: Based on the analysis, the authors suggest that Random Forest (RF) is the most effective predictive model for Low Birth Weight (LBW). They also found that the Wrapper method is the best feature selection technique. Notable features of LBW include child is a twin, region, and delivery by caesarean section.