Objective
The most serious complication of Kawasaki syndrome (KS) is coronary artery lesions (CAL). About 20%-25% of KS will develop into severe CAL without intervention. Machine learning (ML) is a branch of artificial intelligence (AI), which integrates complex data sets on a large scale and uses huge data to predict future events. Besides, computers can reveal new relationships that doctors may not easy to find. The present study presented a model to predict the risk of CAL in KS children by different algorithms to achieve the early diagnosis of CAL.
Methods
A total of 158 children were enrolled from Women and Children’s Hospital, Qingdao University and divided into 7 to 3 as the training sets and the test sets for modeling and validation studies. The clinical manifestations and auxiliary examinations were collected as input features in our models based on the latest 6th edition diagnostic guidelines. Prior to applying the algorithm to modeling, the principal component analysis (PCA) was used to achieve dimension reduction for eliminating the high correlation between features and the Synthetic Minority Oversampling Technique (SMOTE) for promoting accuracy. There are several classifiers are constructed for models including the Random Forest (RF), the Logical regression (LG), and the eXtreme Gradient Boosting (XGBoost).
Results
The sensitivity and specificity of RF were 0.8 and 0.906, and the area under the curve (AUC) was 0.972. For LG, the sensitivity and specificity were 0.6 and 0.976. The XGBoost were 0.2 and 0.953, respectively.
Conclusion
Models are established through three different algorithms to achieve the best sensitivity and specificity. The RF was superior to other methods, which provides a reference for the prevention of CAL.