BACKGROUND In emergency departments (ED), timely rescue is very important as patients’ conditions usually deteriorate rapidly. Early diagnosis can increase patients’ chances of survival. Early diagnosis can be improved by predictive models based on machine learning using Electronic Medical Record (EMR) data. However, ED data are usually imbalanced, having missing values and sparse features. These quality issues make it challenging to build early identification models for diseases in ED. OBJECTIVE The objective of this study is to propose a systematic approach to deal with missing, imbalanced and sparse feature problems of ED data. METHODS We used random forest and K-means algorithms to interpolate missing values and under-sample data. Regarding sparse features, we used principal component analysis to reduce dimensions. For continuous and discrete variables, the decision coefficient R2 and Kappa coefficient are used to evaluate the performance respectively. The area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPRC) are used to estimate the model performance. To further evaluate the proposed approach, we carried out a case study using an ED dataset extracted from Hainan Hospital of Chinese PLA General Hospital. A logistic regression model for patient condition worsening prediction was built out of the data processed by the proposed approach. RESULTS A total of 1085 patients with rescue record and 17959 patients without rescue record were collected, which were significantly imbalanced. 275, 402 and 891 variables are extracted from laboratory tests, medications and diagnosis, respectively. After data preprocessing, the median R2 of random forest interpolation for continuous variables is 0.623 (IQR: 0.647), and the median of Kappa coefficient for discrete variable interpolation is 0.444 (IQR: 0.285). The logistic regression model constructed using the initial diagnostic data has poor performance and variable separation, which is reflected in the abnormally high OR values of the two variables of cardiac arrest and respiratory arrest (27857.4 and 9341.6) and an abnormal confidence interval. Using the processed data, the recall of the model reaches 0.77, F1-SCORE is 0.74, and AUC is 0.64. CONCLUSIONS We proposed a machine learning method to deal with data quality issues such as missing data, data imbalance, and sparse features in emergency data, so as to improve data availability. A preliminary case study indicate the results produced by the proposed method can be used for building prediction model for emergency patients.
This simulation study indicates that an intelligent DCC significantly increases compliance with best practice by reducing the percentage of unchecked items during ICU ward rounds, while the user satisfaction rate remains high. Real-life clinical research is required to evaluate this new type of checklist further.
Background The coronavirus disease (COVID-19) was discovered in China in December 2019. It has developed into a threatening international public health emergency. With the exception of China, the number of cases continues to increase worldwide. A number of studies about disease diagnosis and treatment have been carried out, and many clinically proven effective results have been achieved. Although information technology can improve the transferring of such knowledge to clinical practice rapidly, data interoperability is still a challenge due to the heterogeneous nature of hospital information systems. This issue becomes even more serious if the knowledge for diagnosis and treatment is updated rapidly as is the case for COVID-19. An open, semantic-sharing, and collaborative-information modeling framework is needed to rapidly develop a shared data model for exchanging data among systems. openEHR is such a framework and is supported by many open software packages that help to promote information sharing and interoperability. Objective This study aims to develop a shared data model based on the openEHR modeling approach to improve the interoperability among systems for the diagnosis and treatment of COVID-19. Methods The latest Guideline of COVID-19 Diagnosis and Treatment in China was selected as the knowledge source for modeling. First, the guideline was analyzed and the data items used for diagnosis and treatment, and management were extracted. Second, the data items were classified and further organized into domain concepts with a mind map. Third, searching was executed in the international openEHR Clinical Knowledge Manager (CKM) to find the existing archetypes that could represent the concepts. New archetypes were developed for those concepts that could not be found. Fourth, these archetypes were further organized into a template using Ocean Template Editor. Fifth, a test case of data exchanging between the clinical data repository and clinical decision support system based on the template was conducted to verify the feasibility of the study. Results A total of 203 data items were extracted from the guideline in China, and 16 domain concepts (16 leaf nodes in the mind map) were organized. There were 22 archetypes used to develop the template for all data items extracted from the guideline. All of them could be found in the CKM and reused directly. The archetypes and templates were reviewed and finally released in a public project within the CKM. The test case showed that the template can facilitate the data exchange and meet the requirements of decision support. Conclusions This study has developed the openEHR template for COVID-19 based on the latest guideline from China using openEHR modeling methodology. It represented the capability of the methodology for rapidly modeling and sharing knowledge through reusing the existing archetypes, which is especially useful in a new and fast-changing area such as with COVID-19.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.