Background
Numerous studies have utilized machine-learning techniques to predict the early onset of type 2 diabetes mellitus. However, fewer studies have been conducted to predict an appropriate diagnosis code for the type 2 diabetes mellitus condition. Further, ensemble techniques such as bagging and boosting have likewise been utilized to an even lesser extent. The present study aims to identify appropriate diagnosis codes for type 2 diabetes mellitus patients by means of building a multi-class prediction model which is both parsimonious and possessing minimum features. In addition, the importance of features for predicting diagnose code is provided.
Methods
This study included 149 patients who have contracted type 2 diabetes mellitus. The sample was collected from a large hospital in Taiwan from November, 2017 to May, 2018. Machine learning algorithms including instance-based, decision trees, deep neural network, and ensemble algorithms were all used to build the predictive models utilized in this study. Average accuracy, area under receiver operating characteristic curve, Matthew correlation coefficient, macro-precision, recall, weighted average of precision and recall, and model process time were subsequently used to assess the performance of the built models. Information gain and gain ratio were used in order to demonstrate feature importance.
Results
The results showed that most algorithms, except for deep neural network, performed well in terms of all performance indices regardless of either the training or testing dataset that were used. Ten features and their importance to determine the diagnosis code of type 2 diabetes mellitus were identified. Our proposed predictive model can be further developed into a clinical diagnosis support system or integrated into existing healthcare information systems. Both methods of application can effectively support physicians whenever they are diagnosing type 2 diabetes mellitus patients in order to foster better patient-care planning.