Background: One of the major challenges that hospitals and clinicians face is the early identification of newborns at risk for adverse events. One of them is neonatal respiratory distress syndrome (RDS). RDS is the widest spared respiratory disorder in immature newborns and the main source of death among them. Machine learning has been broadly accepted and used in various scopes to analyze medical information and is very useful in the early detection of RDS.
Objective: This study aimed to develop a model to predict neonatal RDS and affecting factors using data mining.
Materials and Methods: The original dataset in this cross-sectional study was extracted from the medical records of newborns diagnosed with RDS from July 2017-July 2018 in Alzahra hospital, Tabriz, Iran. This data includes information about 1469 neonates, and their mothers information. The data were preprocessed and applied to expand the classification model using machine learning techniques such as support vector machine, Naïve Bayes, classification tree, random forest, CN2 rule induction, and neural network, for prediction of RDS episodes. The study compares models according to their accuracy.
Results: Among the obtained results, an accuracy of 0.815, sensitivity of 0.802, specificity of 0.812, and area under the curve of 0.843 was the best output using random forest.
Conclusion: The findings of our study proved that new approaches, such as data mining, may support medical decisions, improving diagnosis in neonatal RDS. The feasibility of using a random forest in neonatal RDS prediction would offer the possibility to decrease postpartum complications of neonatal care.
Key words: Data mining, Classification, Neonatal respiratory distress syndrome, Newborn, Machine learning.