Background
Different machine learning (ML) technologies have been applied in healthcare systems with diverse applications. We aimed to determine the model feasibility and accuracy of predicting patient portal use among diabetic patients by using six different ML algorithms. In addition, we also compared model performance accuracy with the use of only essential variables.
Methods
This was a single-center retrospective observational study. From March 1, 2019 to February 28, 2020, we included all diabetic patients from the study emergency department (ED). The primary outcome was the status of patient portal use. A total of 18 variables consisting of patient sociodemographic characteristics, ED and clinic information, and patient medical conditions were included to predict patient portal use. Six ML algorithms (logistic regression, random forest (RF), deep forest, decision tree, multilayer perception, and support vector machine) were used for such predictions. During the initial step, ML predictions were performed with all variables. Then, the essential variables were chosen via feature selection. Patient portal use predictions were repeated with only essential variables. The performance accuracies (overall accuracy, sensitivity, specificity, and area under receiver operating characteristic curve (AUC)) of patient portal predictions were compared.
Results
A total of 77,977 unique patients were placed in our final analysis. Among them, 23.4% (18,223) patients were diabetic mellitus (DM). Patient portal use was found in 26.9% of DM patients. Overall, the accuracy of predicting patient portal use was above 80% among five out of six ML algorithms. The RF outperformed the others when all variables were used for patient portal predictions (accuracy 0.9876, sensitivity 0.9454, specificity 0.9969, and AUC 0.9712). When only eight essential variables were chosen, RF still outperformed the others (accuracy 0.9876, sensitivity 0.9374, specificity 0.9932, and AUC 0.9769).
Conclusion
It is possible to predict patient portal use outcomes when different ML algorithms are used with fair performance accuracy. However, with similar prediction accuracies, the use of feature selection techniques can improve the interpretability of the model by addressing the most relevant features.