Objective Severe Mycoplasma pneumoniae pneumonia (SMPP) poses significant diagnostic challenges due to its clinical features overlapping with those of other common respiratory diseases. This study aims to develop and validate machine learning (ML) models for the early identification of SMPP and the risk prediction for liver and heart damage in SMPP using accessible laboratory indicators.
Methods Cohort 1 was divided into SMPP group and other respiratory diseases group. Cohort 2 was divided into myocardial damage, liver damage, and non-damage groups. The models built using five ML algorithms were compared to screen the best algorithm and model. Receiver Operating Characteristic (ROC) curves, accuracy, sensitivity, and other performance indicators were utilized to evaluate the performance of each model. Feature importance and Shapley Additive Explanation (SHAP) values were introduced to enhance the interpretability of models. Cohort 3 was used for external validation.
Results In Cohort 1, the SMPP differential diagnostic model developed using the LightGBM algorithm achieved the highest performance with AUCROC=0.968. In Cohort 2, the LightGBM model demonstrated superior performance in distinguishing myocardial damage, liver damage, and non-damage in SMPP patients (accuracy=0.818). Feature importance and SHAP values indicated that Age and CK-MB emerged as pivotal contributors significantly influencing Model 2’s output magnitude. The diagnostic and predictive abilities of the ML models were validated in Cohort 3, demonstrating the models had some clinical generalizability.
Conclusion The Model 1 and Model 2 constructed by LightGBM algorithm showed excellent ability in differential diagnosis of SMPP and risk prediction of organ damage in children.