Purpose
To construct and compare machine learning models for predicting the risk of gestational diabetes mellitus (GDM).
Method
The clinical data of 2048 pregnant women who gave birth at Shunde Women’s and Children’s Hospital of Guangdong Medical University between June 2019 and June 2021 were retrospectively collected. Logistic regression, backpropagation neural networks, random forests, and support vector machines were constructed with the R studio and Python software programs. The logistic regression and random forest models were used to identify significant influencing factors. The area under the receiver operating characteristic curve (AUC) was used to evaluate the predictive performance and discriminative ability of the models, and the Hosmer-Lemeshow test was used to determine goodness of fit.
Results
Age, glycated hemoglobin, fasting blood glucose, white blood cell count, hemoglobin, and activated partial prothrombin time were identified as significant factors associated with GDM. The random forest model had the best prediction effect (accuracy, 78.07%; Youden index, 1.56). In all four models, AUC was greater than 78%. The Hosmer–Lemeshow fit test showed that all four models were a good fit.
Conclusion
It was concluded that age, GHB, FBG, WBC, HB, and APTT are the more important related influencing factors or early predictors of gestational diabetes. Among the tested models, random forest was the best one for predicting the risk of GDM in early pregnancy.