Background Merging disparate and heterogeneous datasets from clinical routine in a standardized and semantically enriched format to enable a multiple use of data also means incorporating unstructured data such as medical free texts. Although the extraction of structured data from texts, known as natural language processing (NLP), has been researched at least for the English language extensively, it is not enough to get a structured output in any format. NLP techniques need to be used together with clinical information standards such as openEHR to be able to reuse and exchange still unstructured data sensibly. Objectives The aim of the study is to automatically extract crucial information from medical free texts and to transform this unstructured clinical data into a standardized and structured representation by designing and implementing an exemplary pipeline for the processing of pediatric medical histories. Methods We constructed a pipeline that allows reusing medical free texts such as pediatric medical histories in a structured and standardized way by (1) selecting and modeling appropriate openEHR archetypes as standard clinical information models, (2) defining a German dictionary with crucial text markers serving as expert knowledge base for a NLP pipeline, and (3) creating mapping rules between the NLP output and the archetypes. The approach was evaluated in a first pilot study by using 50 manually annotated medical histories from the pediatric intensive care unit of the Hannover Medical School. Results We successfully reused 24 existing international archetypes to represent the most crucial elements of unstructured pediatric medical histories in a standardized form. The self-developed NLP pipeline was constructed by defining 3.055 text marker entries, 132 text events, 66 regular expressions, and a text corpus consisting of 776 entries for automatic correction of spelling mistakes. A total of 123 mapping rules were implemented to transform the extracted snippets to an openEHR-based representation to be able to store them together with other structured data in an existing openEHR-based data repository. In the first evaluation, the NLP pipeline yielded 97% precision and 94% recall. Conclusion The use of NLP and openEHR archetypes was demonstrated as a viable approach for extracting and representing important information from pediatric medical histories in a structured and semantically enriched format. We designed a promising approach with potential to be generalized, and implemented a prototype that is extensible and reusable for other use cases concerning German medical free texts. In a long term, this will harness unstructured clinical data for further research purposes such as the design of clinical decision support systems. Together with structured data already integrated in openEHR-based representations, we aim at developing an interoperable openEHR-based application that is capable of automatically assessing a patient's risk status based on the patient's medical history at time of admission.
Background: To embrace the need for freely accessible training data sets originating from the real world, in the ELISE project, we integrate source data from a pediatric intensive care unit and provide it to researchers. Objective: We present our vision, initial results and steps on a trail towards an evolutionary open pediatric intensive care data set. Methods: Our evolution plan for the data set comprises three steps. The final data set will include raw clinical data and labels on critical outcomes such as organ dysfunction and sepsis, generated automatically by computerized and well-evaluated methods. Results: First step resulted in an initial version data set available in a central repository. Conclusions: Our approach has great potential to provide a comprehensive open intensive care data set labeled for critical pediatric outcomes and, thus, contributing to overcome the current lack of real-world pediatric intensive care data usable for training data-driven algorithms.
IntroductionSystemic inflammatory response syndrome (SIRS), sepsis and associated organ dysfunctions are life-threating conditions occurring at paediatric intensive care units (PICUs). Early recognition and treatment within the first hours of onset are critical. However, time pressure, lack of personnel resources, and the need for complex age-dependent diagnoses impede an accurate and timely diagnosis by PICU physicians. Data-driven prediction models integrated in clinical decision support systems (CDSS) could facilitate early recognition of disease onset.ObjectivesTo estimate the sensitivity and specificity of previously developed prediction models (index tests) for the detection of SIRS, sepsis and associated organ dysfunctions in critically ill children up to 12 hours before reference standard diagnosis is possible.Methods and analysisWe conduct a monocentre, prospective diagnostic test accuracy study. Clinicians in the PICU of the tertiary care centre Hannover Medical School, Germany, continuously screen and recruit patients until the adaptive sample size (originally intended sample size of 500 patients) is enrolled. Eligible are children (0–17 years, all sexes) who stay in the PICU for ≥12 hours and for whom an informed consent is given. All eligible patients are independently assessed for SIRS, sepsis and organ dysfunctions using corresponding predictive and knowledge-based CDSS models. The knowledge-based CDSS models serve as imperfect reference standards. The assessments are used to estimate the sensitivities and specificities of each predictive model using a clustered nonparametric approach (main analysis). Subgroup analyses (‘age groups’, ‘sex’ and ‘age groups by sex’) are predefined.Ethics and disseminationThis study obtained ethics approval from the Hannover Medical School Ethics Committee (No. 10188_BO_SK_2022). Results will be disseminated as peer-reviewed publications, at scientific conferences, and to patients in an appropriate dissemination approach.Trial registration numberThis study was registered with the German Clinical Trial Register (DRKS00029071) on 2022-05-23.Protocol version10188_BO_SK_2022_V.2.0–20220330_4_Studienprotokoll.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.