Background
Due to the phenotypic similarities among different pediatric respiratory diseases with chronic cough, primary doctors often misdiagnose and the misuse of examinations is prevalent. In the pre-diagnosis stage, the patients' chief complaints and other information in the electronic medical record (EMR) provide a powerful reference for respiratory experts to make preliminary disease judgment and examination plan. In this paper, we proposed an intelligent prediagnosis system to predict disease diagnosis and recommend examinations based on EMR text.
Methods
We examined the clinical notes of 178,293 children with chronic cough symptoms from retrospective EMR data. The dataset is split into 7:3 for training and testing. From the testing set, we also extract 5% of samples for validation. We proposed a medical-semantic-aware convolution neural network (MSCNN) framework that can accomplish two downstream tasks from the same medical language model through transfer learning. First, a medical language model based on the word2vec algorithm was built to generate embeddings for the text data. Then, text convolutional neural network (TextCNN) was used to build models for disease prediction and examination recommendation.
Results
We implemented 5 algorithms for disease prediction. In the disease prediction task, our algorithm outperformed the baseline methods on all metrics, with a top-1 accuracy (AC) of 0.68 and a top-3 AC of 0.923 on the testing set. By adding data enhancement, the top-3 AC reached 0.926. In the examination recommendation task, the overall AC on the testing set was 0.93 and the macro average (MA) F1-score was 0.88. The average area under the curve (AUC) on the training set was 0.97 while on the testing set it was 0.86.
Conclusions
We constructed an intelligent prediagnosis system with an MSCNN framework that can predict diseases and make examination recommendations based on EMR data. Our approach achieved good results on a retrospective clinical dataset and thus has great potential for the application of automated diagnosis assist in clinical practice during pre-diagnosis stage, which will provide help for primary level doctors or doctors in basic-level hospitals. Due to the generality of the proposed framework, it can be straight forwardly extended to prediagnosis for other diseases.