CogStack - Experiences Of Deploying Integrated Information Retrieval And Extraction Services In A Large National Health Service Foundation Trust Hospital

Jackson, Roy; Kartoglu, Ismail E.; Agrawal, Asha; Lui, Kenneth; Wu, Honghan; Groza, Tudor; Roberts, Angus; Gorrell, Genevieve; Song, Xingyi; Lewsley, Damian; Northwood, Doug; Folarin, Amos; Stringer, Clive; Stewart, Robert; Dobson, Richard

doi:10.1101/123299

Cited by 8 publications

(12 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Two clinicians (blinded to the ICD-10 and OPCS-4 codes recorded) reviewed the entire hospital record (charts, referral letters, discharge letters, imaging reports) for 283 patient hospital episodes from two large NHS Trusts (University College London Hospitals NHS Foundation Trust and Kings College Hospital NHS Foundation Trust). The hospital record corpus (14,364,947 words in total) was made available as a single text files per patient, through the use of CogStack(39), method of enterprise-wide retrieval and extraction architecture for structured and unstructured information which integrates data across multiple EHR systems in a hospital. Patient consent for reviewing these records was provided from the NIHR funded SIGNUM study of stroke patients.…”

Section: Methodsmentioning

confidence: 99%

Bleeding in cardiac patients prescribed antithrombotic drugs: Electronic health record phenotyping algorithms, incidence, trends and prognosis

Pasea

Chung

Pujades-Rodriguez

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

BackgroundClinical guidelines and public health authorities lack recommendations on scalable approaches to defining and monitoring the occurrence and severity of bleeding in populations prescribed antithrombotic therapy. We aimed to develop electronic health record algorithms for different bleeding phenotypes, and to determine the incidence, time trends and prognosis of bleeding in patients with incident cardiac disorders indicated for antiplatelet and/or vitamin K antagonist (VKA) therapy.MethodsWe examined linked primary care, hospital admission and death registry electronic health records (CALIBER 1998-2010, England) of patients with newly diagnosed atrial fibrillation, acute myocardial infarction, unstable angina or stable angina to develop algorithms for bleeding events. Kaplan-Meier plots were used to estimate the incidence of bleeding events and we used Cox regression models to assess prognosis for all-cause mortality, atherothrombotic events and further bleeding following bleeding events.ResultsWe present electronic health record phenotyping algorithms for bleeding based on bleeding diagnosis in primary or hospital care, symptoms, transfusion, surgical procedures, and haemoglobin values. In validation of the phenotype we estimated a positive predictive value of 0.88 (95% Cl: 0.64, 0.99) for hospitalised bleeding. Amongst 128,815 patients, 27259 (21.2%) had at least one bleeding event, with 5 year risks of bleeding of 29.1%, 21.9%, 25.3% and 23.4% following diagnoses of atrial fibrillation, acute myocardial infarction, unstable angina and stable angina respectively. Rates of hospitalised bleeding per 1000 patients more than doubled from 1.02 (95% Cl: 0.83, 1.22) in January 1998 to 2.68 (95% Cl: 2.49, 2.88) in December 2009 coinciding with increased rates of antiplatelet and VKA prescribing. Patients with hospitalised bleeding and primary care bleeding, with or without markers of severity, were at increased risk of all-cause mortality and atherothrombotic events compared to those with no bleeding. For example the hazard ratio for all-cause mortality was 1.98 (95% Cl: 1.86, 2.11) for primary care bleeding with markers of severity, and 1.99 (95% Cl: 1.92, 2.05) for hospitalised bleeding without markers of severity, compared to patients with no bleeding.ConclusionsElectronic health record bleeding phenotyping algorithms offer a scalable approach to monitoring bleeding in the population. Incidence of bleeding has doubled in incidence since 1998, affects 1 in 4 cardiac patients, and is associated with poor prognosis. Efforts are required to tackle this iatrogenic epidemic.What is already known?Clinical guidelines and public health authorities lack recommendations on how to define or monitor the occurrence and severity of bleeding in populations.This is particularly important because clinical guidelines increasingly recommend the use of one, two or three antiplatelet and vitamin K antagonist drugs to lower the risk of subsequent atherothrombotic events in common heart diseases including atrial fibrillation, acute coronary syndromes and chronic stable angina.Clinical guidelines lack consistent recommendations of how to reduce the main side effect of bleeding.For acute myocardial infarction it has been shown that combining primary care electronic health records (which include information from hospital discharge summaries) and hospital admission data can generate valid EHR disease phenotypes and provide real-world estimates of disease occurrence.What is not known?It is not known how to define bleeding occurrence and severity in large scale, unselected populations by combining available information on bleeding diagnosis in primary or hospital care, symptoms, transfusion, surgical procedures, and haemoglobin values.The population-based incidence, time trends and long-term prognosis of bleeding have not been evaluated in people with common cardiac disorders.Comparisons of the population burden of bleeding across common cardiac disorders, such as atrial fibrillation, acute coronary syndromes and stable angina, are lacking.What this study adds?Phenotype: We developed standardised replicable EHR phenotyping algorithms defining bleeding and severity measures based on available clinical information across structured primary and hospital care EHR sources.Incidence: At 5 years of follow-up, one in five patients with cardiac disease had a bleeding event and 6.5% had fatal or severe bleeding.Trends: There was approximately a two-fold increase in incidence of primary care and hospitalised bleeding between 1998 and 2010. The rate of fatal bleeding remained stable.Prognosis: Patients with bleeding recorded in primary care or in hospital admissions are at increased bleeding between 1998 and 2010. The rate of fatal bleeding remained stable, risk of all-cause death and atherothrombotic events.

show abstract

Section: Methodsmentioning

confidence: 99%

Bleeding in cardiac patients prescribed antithrombotic drugs: Electronic health record phenotyping algorithms, incidence, trends and prognosis

Pasea

Chung

Pujades-Rodriguez

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Bleeding assignments from the clinicians review was compared with those from the phenotyping algorithm and we estimated the PPV, NPV, sensitivity and specificity using the case review data as the "gold standard". We extracted hospital data (14,364,947 words) using CogStack [57] from the consented Stroke InvestiGation Network-Understanding Mechanisms (SIGNUM) study.…”

Section: ) Cross-ehr Source Concordancementioning

confidence: 99%

UK phenomics platform for developing and validating EHR phenotypes: CALIBER

Denaxas

González-Izquierdo

Direk

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Objective Electronic Health Records (EHR) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems and collected for purposes other than medical research. We describe an approach for developing, validating and sharing reproducible phenotypes from national structured EHR in the United Kingdom (UK) with applications for translational research. Materials and MethodsWe implemented a rule-based phenotyping framework, with up to six approaches of validation. We applied our framework to a sample of 15 million individuals in a national EHR data source (population-based primary care, all ages) linked to hospitalization and death records in England. Data comprised continuous measurements e.g. blood pressure, medication information and coded diagnoses, symptoms, procedures and referrals, recorded using five controlled clinical terminologies: a) Read (primary care, subset of SNOMED-CT), b) International Classification of Diseases 9th/10th Revision (ICD-9, ICD-10, secondary care diagnoses and cause of mortality), c) OPCS Classification of Interventions and Procedures (OPCS-4, hospital surgical procedures), and d) DM+D prescription codes. Results Using the CALIBER phenotyping framework, we created algorithms for 51 diseases, syndromes, biomarkers and lifestyle risk factors and provide up to six validation approaches. The EHR phenotypes are curated in the open-access CALIBER Portal (https://www.caliberresearch.org/portal) and have been used by 40 national/international research groups in 60 peer-reviewed publications.

show abstract

“…data retrieval, information extraction and semantic indexing. CogStack [14], a data harmonisation and enterprise search toolkit for EHRs, is adopted in the data retrieval step to provide a unified interface to unstructured EHR data, which is often very heterogeneous in format and distributed in storage. Each document that flows out from medical history, laboratory results); the continuous learning subsystem (to be described in next subsection) learns the contexts from user assessed annotations (see Supplementary Material 1 for detail).…”

Section: The Producing Subsystemmentioning

confidence: 99%

“…To realise a general-purpose biomedical information extraction (IE) system on EHRs, there are at least three fundamental challenges: a) syntactic heterogeneity: how to effectively access multi-modal/multisource EHR data that are almost certainly heterogeneous in formats, data models and access interfaces; b) knowledge coverage: how to cover all possible biomedical concepts that are required by potential use cases; c) context capturing: how to represent and capture the contexts associated with extracted concepts, and which are critical to understanding the clinical domain. To address these challenges, SemEHR architects a production infrastructure that integrates our previous work in the CogStack pipeline [14] to harmonise and cleanse heterogeneous records, using them to identify contextualised 4 mentions (negation, temporality and experiencer) of a wide range of biomedical concepts including SNOMED CT 1 , ICD-10 2 , LOINC 3 and Drug Ontology 4 . In addition, SemEHR automatically associates semantic types of annotations and their clinical contexts (derived from containing documents or sections) with dedicated extraction rules, which enables better IE capabilities such as populating the structured vital sign data from observation notes.…”

mentioning

confidence: 99%

SemEHR: A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research

Toti

Morley

et al. 2017

Preprint

Self Cite

View full text Add to dashboard Cite

Objective: Unlocking the data contained within both structured and unstructured components of Electronic Health Records (EHRs) has the potential to provide a step change in data available forsecondary research use, generation of actionable medical insights, hospital management and trial recruitment. To achieve this, we implemented SemEHR -a semantic search and analytics, open source tool for EHRs.Methods: SemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying * The manuscript has been submitted to JAMIA

show abstract

CogStack - Experiences Of Deploying Integrated Information Retrieval And Extraction Services In A Large National Health Service Foundation Trust Hospital

Cited by 8 publications

References 29 publications

Bleeding in cardiac patients prescribed antithrombotic drugs: Electronic health record phenotyping algorithms, incidence, trends and prognosis

Bleeding in cardiac patients prescribed antithrombotic drugs: Electronic health record phenotyping algorithms, incidence, trends and prognosis

UK phenomics platform for developing and validating EHR phenotypes: CALIBER

SemEHR: A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research

Contact Info

Product

Resources

About