Objective
To predict preterm birth in nulliparous women using logistic regression and machine learning.
Design
Population-based retrospective cohort.
Participants
Nulliparous women (N = 112,963) with a singleton gestation who gave birth between 20–42 weeks gestation in Ontario hospitals from April 1, 2012 to March 31, 2014.
Methods
We used data during the first and second trimesters to build logistic regression and machine learning models in a “training” sample to predict overall and spontaneous preterm birth. We assessed model performance using various measures of accuracy including sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiver operating characteristic curve (AUC) in an independent “validation” sample.
Results
During the first trimester, logistic regression identified 13 variables associated with preterm birth, of which the strongest predictors were diabetes (Type I: adjusted odds ratio (AOR): 4.21; 95% confidence interval (CI): 3.23–5.42; Type II: AOR: 2.68; 95% CI: 2.05–3.46) and abnormal pregnancy-associated plasma protein A concentration (AOR: 2.04; 95% CI: 1.80–2.30). During the first trimester, the maximum AUC was 60% (95% CI: 58–62%) with artificial neural networks in the validation sample. During the second trimester, 17 variables were significantly associated with preterm birth, among which complications during pregnancy had the highest AOR (13.03; 95% CI: 12.21–13.90). During the second trimester, the AUC increased to 65% (95% CI: 63–66%) with artificial neural networks in the validation sample. Including complications during the pregnancy yielded an AUC of 80% (95% CI: 79–81%) with artificial neural networks. All models yielded 94–97% negative predictive values for spontaneous PTB during the first and second trimesters.
Conclusion
Although artificial neural networks provided slightly higher AUC than logistic regression, prediction of preterm birth in the first trimester remained elusive. However, including data from the second trimester improved prediction to a moderate level by both logistic regression and machine learning approaches.