Background
Drug-induced long-QT syndrome (diLQTS) is a major concern among patients who are hospitalized, for whom prediction models capable of identifying individualized risk could be useful to guide monitoring. We have previously demonstrated the feasibility of machine learning to predict the risk of diLQTS, in which deep learning models provided superior accuracy for risk prediction, although these models were limited by a lack of interpretability.
Objective
In this investigation, we sought to examine the potential trade-off between interpretability and predictive accuracy with the use of more complex models to identify patients at risk for diLQTS. We planned to compare a deep learning algorithm to predict diLQTS with a more interpretable algorithm based on cluster analysis that would allow medication- and subpopulation-specific evaluation of risk.
Methods
We examined the risk of diLQTS among 35,639 inpatients treated between 2003 and 2018 with at least 1 of 39 medications associated with risk of diLQTS and who had an electrocardiogram in the system performed within 24 hours of medication administration. Predictors included over 22,000 diagnoses and medications at the time of medication administration, with cases of diLQTS defined as a corrected QT interval over 500 milliseconds after treatment with a culprit medication. The interpretable model was developed using cluster analysis (K=4 clusters), and risk was assessed for specific medications and classes of medications. The deep learning model was created using all predictors within a 6-layer neural network, based on previously identified hyperparameters.
Results
Among the medications, we found that class III antiarrhythmic medications were associated with increased risk across all clusters, and that in patients who are noncritically ill without cardiovascular disease, propofol was associated with increased risk, whereas ondansetron was associated with decreased risk. Compared with deep learning, the interpretable approach was less accurate (area under the receiver operating characteristic curve: 0.65 vs 0.78), with comparable calibration.
Conclusions
In summary, we found that an interpretable modeling approach was less accurate, but more clinically applicable, than deep learning for the prediction of diLQTS. Future investigations should consider this trade-off in the development of methods for clinical prediction.