Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system

Fonferko‐Shadrach, Beata; Lacey, Arron; Roberts, Angus; Akbari, Ashley; Thompson, Simon; Ford, David; Lyons, Ronan A; Rees, Mark I.; Pickrell, William Owen

doi:10.1136/bmjopen-2018-023232

Cited by 48 publications

(46 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SAIL Databank took this approach because of the risks of introducing insufficiently deidentified information into the databank. Working with free-text data at source is by means of an NHS honorary contract [ 8 ], and all proposals to use SAIL data must have received approval from an independent IGRP before access can be granted via the data safe haven [ 77 ].…”

Section: Resultsmentioning

confidence: 99%

“…Alternative methods focus only on isolating the relevant clinical information from personal identifiers via extraction of specified variables such as medication dosage instructions or diagnoses, which are whitelisted and preserved in text. Whitelisting can be thought of as the converse of blacklisting in that it extracts clinically informative data rather than excluding disallowed pieces of information [ 6 - 8 ].…”

Section: Introductionmentioning

confidence: 99%

“…It is important to note that deidentification and extraction algorithms do not work out of the box but often have to be built and tested on specific data annotated by domain-specific experts to train and develop the algorithms. In general, algorithms can be trained to work to a standard comparable with that of a human annotator, but accuracy can decrease with increasing information complexity [ 6 - 8 ].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper

Jones¹,

Ford²,

Lea³

et al. 2020

J Med Internet Res

View full text Add to dashboard Cite

Background Clinical free-text data (eg, outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be deidentified or anonymized before they can be reused for research, but there is a lack of established guidelines to govern effective deidentification and use of free-text information and avoid damaging data utility as a by-product. Objective This study aimed to develop recommendations for the creation of data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient and public benefit. Methods We outlined data protection legislation and regulations relating to the United Kingdom for context and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders, including text-mining researchers and the general public, to explore perceived barriers and solutions in working with clinical free-text. Results We proposed a set of recommendations, including the need for authoritative guidance on data governance for the reuse of free-text data, to ensure public transparency in data flows and uses, to treat deidentified free-text data as potentially identifiable with use limited to accredited data safe havens, and to commit to a culture of continuous improvement to understand the relationships between the efficacy of deidentification and reidentification risks, so this can be communicated to all stakeholders. Conclusions By drawing together the findings of a combination of activities, we present a position paper to contribute to the development of data governance standards for the reuse of clinical free-text data for secondary purposes. While working in accordance with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper

Jones¹,

Ford²,

Lea³

et al. 2020

J Med Internet Res

View full text Add to dashboard Cite

show abstract

“…[27] The most comprehensive pipeline specific to the field of epilepsy is ExECT (extraction of epilepsy clinical text). [28] This extracts a diagnosis of epilepsy as a binary field (88% precision, 89% recall), focal seizures as a binary field (96% precision, 70% recall), generalized seizures as a binary field (89% precision, 52% recall), and epilepsy type as a trinary field defined as either focal, generalized, or absence (90% precision, 80% recall).…”

Section: Epilepsy Type and Seizure Typementioning

confidence: 99%

Can antiepileptic drug efficacy be studied from electronic health records? A review of current approaches

Decker

Hill

Baldassano

et al. 2020

Preprint

View full text Add to dashboard Cite

As automated data extraction and natural language processing (NLP) are rapidly evolving, applicability to harness large data to improve healthcare delivery is garnering great interest. Assessing antiepileptic drug (AED) efficacy remains a barrier to improving epilepsy care. In this review, we examined automatic electronic health record (EHR) extraction methodologies pertinent to epilepsy examining AED efficacy. We also reviewed more generalizable NLP pipelines to extract other critical patient variables. Our review found varying reports of performance measures. Whereas automated data extraction pipelines are a crucial advancement, this review calls attention to standardizing NLP methodology and accuracy reporting for greater generalizability. Moreover, the use of crowdsourcing competitions to spur innovative NLP pipelines would further advance this field.

show abstract

“…14 Another example is the extraction of epilepsy variables from clinical reports using ExECT (extraction of epilepsy clinical text), a system based on GATE. 15 Another approach to extract information from texts is the use of machine learning, which is often done with classical approaches, such as support vector machines or logistic regression. 8,9,11 In recent years, newer deep learning approaches have also been evaluated for information extraction.…”

Section: Related Workmentioning

confidence: 99%

Information Extraction from Echocardiography Reports for a Clinical Follow-up Study—Comparison of Extracted Variables Intended for General Use in a Data Warehouse with Those Intended Specifically for the Study

et al. 2019

View full text Add to dashboard Cite

Background The interest in information extraction from clinical reports for secondary data use is increasing. But experience with the productive use of information extraction processes over time is scarce. A clinical data warehouse has been in use at our university hospital for several years, which also provides an information extraction of echocardiography reports developed for general use. Objectives This study aims to illustrate the difficulties encountered, while using data from a preexisting information extraction process for a large clinical study. To compare the data from the preexisting process with the data obtained from a specially developed process designed to improve the quality and completeness of the study data. Methods We extracted the echocardiography variables for 440 patients from the general-use information extraction of the data warehouse (678 reports). Then we developed an information extraction process for the same variables but specifically for this study, with the aim to extract as much information as possible from the text. The extracted data of both processes were compared with a newly created gold standard defined by a cardiologist with long-standing experience in heart failure. Results Among 57 echocardiography variables considered relevant for the study, 50 were documented in the routine text reports and could be extracted. Twenty of the required variables were not provided by the general-use extraction process, some others were not provided correctly. The median macro F1-score (precision, recall) across the 30 variables for which values were extracted was 0.81 (0.94, 0.77). Across all 50 variables, as relevant for the study, median macro F1-score was only 0.49 (0.56, 0.46). Employing the study-specific approach considerably improved the quality and completeness of the variables, resulting in F1-scores of 0.97 (0.98, 0.96) across all variables. Conclusion Data from information extractions can be used for large clinical studies. However, preexisting information extraction processes should be treated with caution, as the time and effort spent defining each variable in the information extraction process may not be clear.

show abstract

Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system

Cited by 48 publications

References 16 publications

Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper

Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper

Can antiepileptic drug efficacy be studied from electronic health records? A review of current approaches

Information Extraction from Echocardiography Reports for a Clinical Follow-up Study—Comparison of Extracted Variables Intended for General Use in a Data Warehouse with Those Intended Specifically for the Study

Contact Info

Product

Resources

About