What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask

Kohane, Isaac S.; Aronow, Bruce J.; Avillach, Paul; Beaulieu‐Jones, Brett K.; Bellazzi, Riccardo; Bradford, Robert L.; Brat, Gabriel; Cannataro, Mario; Cimino, James J.; Barrio, Noelia García; Gehlenborg, Nils; Ghassemi, Marzyeh; Gutiérrez-Sacristán, Alba; Hanauer, David A.; Holmes, John H.; Hong, Chuan; Klann, Jeffrey G.; Loh, Ne Hooi Will; Luo, Yuan; Mandl, Kenneth D.; Daniar, Mohamad; Moore, Jason H.; Murphy, Shawn N.; Neuraz, Antoine; Ngiam, Kee Yuan; Omenn, Gilbert S.; Palmer, Nathan; Patel, Lav P.; Pedrera-Jiménez, Miguel; Sliz, Piotr; South, Andrew M.; Tan, Amelia L. M.; Taylor, Deanne; Taylor, Bradley; Torti, Carlo; Vallejos, Andrew K.; Wagholikar, Kavishwar B.; Weber, Griffin M; Cai, Tianxi

doi:10.2196/22219

Cited by 90 publications

(85 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…25,38 Further, evaluating data provenance enables the reporting of data characteristics in EHR centric studies through presenting data completeness, data collection and handling, and the types of data. 36 The DQ failures identified in our data were of the typology well represented in the literature. [13][14][15][16][17][18][19][20] Missingness is a common challenge in the secondary use of EHR, and its causes include data being digitized after start of study, 14 and recording in free form clinical notes.…”

Section: Discussionmentioning

confidence: 55%

“…The questions were guided by the why, how, and who of data recording, to interpret the data in the context of its generation, and subsequent lifecycle. Data provenance has been highlighted as an important component of EHR secondary use, [35][36][37] and other studies have documented the consideration of information flow as necessary for robust secondary use and avoiding biases. 25,38 Further, evaluating data provenance enables the reporting of data characteristics in EHR centric studies through presenting data completeness, data collection and handling, and the types of data.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

A Systematic Approach to Reconciling Data Quality Failures: Investigation Using Spinal Cord Injury Data

2021

View full text Add to dashboard Cite

Background Secondary use of electronic health record's (EHR) data requires evaluation of data quality (DQ) for fitness of use. While multiple frameworks exist for quantifying DQ, there are no guidelines for the evaluation of DQ failures identified through such frameworks. Objectives This study proposes a systematic approach to evaluate DQ failures through the understanding of data provenance to support exploratory modeling in machine learning. Methods Our study is based on the EHR of spinal cord injury inpatients in a state spinal care center in Australia, admitted between 2011 and 2018 (inclusive), and aged over 17 years. DQ was measured in our prerequisite step of applying a DQ framework on the EHR data through rules that quantified DQ dimensions. DQ was measured as the percentage of values per field that meet the criteria or Krippendorff's α for agreement between variables. These failures were then assessed using semistructured interviews with purposively sampled domain experts. Results The DQ of the fields in our dataset was measured to be from 0% adherent up to 100%. Understanding the data provenance of fields with DQ failures enabled us to ascertain if each DQ failure was fatal, recoverable, or not relevant to the field's inclusion in our study. We also identify the themes of data provenance from a DQ perspective as systems, processes, and actors. Conclusion A systematic approach to understanding data provenance through the context of data generation helps in the reconciliation or repair of DQ failures and is a necessary step in the preparation of data for secondary use.

show abstract

Section: Discussionmentioning

confidence: 55%

Section: Discussionmentioning

confidence: 99%

A Systematic Approach to Reconciling Data Quality Failures: Investigation Using Spinal Cord Injury Data

2021

View full text Add to dashboard Cite

show abstract

“…To move from heterogenous and proprietary EHR data to OMOP that aligns to the FAIR principles [5], we developed a target oriented concept based on medical expertise and an EHR as well as an OMOP analysis (Figure 1). Working with EHR data in research requires a deep understanding of the original data (e.g data origin, data completeness, data correctness, data structure) and the given target environment for research [2]. Thus we built a multidisciplinary team of data and computer scientists as well as pharmacists.…”

Section: Methodsmentioning

confidence: 99%

“…The pandemic of coronavirus disease 2019 (COVID-19) has shown the need of standardized and reproducible research data, especially regarding drug administration, as observational studies are important to gain evidence, learn on real-word data and improve the COVID-19 patient treatment and their effects in the future [1]. However, those studies highly depend on the level of data quality, interoperability and reproducibility, even more if they are proceeded in a multi-centric environment [2].…”

Section: Introductionmentioning

confidence: 99%

Transfer of Clinical Drug Data to a Research Infrastructure on OMOP – A FAIR Concept

Reinecke

Zoch

Wilhelm

et al. 2021

Applying the FAIR Principles to Accelerate Health Research in Europe in the Post COVID-19 Era

View full text Add to dashboard Cite

Generating evidence based on real-world data is gaining importance in research not least since the COVID-19 pandemic. The Common Data Model of Observational Medical Outcomes Partnership (OMOP) is a research infrastructure that implements FAIR principles. Although the transfer of German claim data to OMOP is already implemented, drug data is an open issue. This paper provides a concept to prepare electronic health record (EHR) drug data for the transfer to OMOP based on requirements analysis and descriptive statistics for profiling EHR data developed by an interdisciplinary team and also covers data quality issues. The concept not only ensures FAIR principles for research, but provides the foundation for German drug data to OMOP transfer.

show abstract

“…High quality data sets are key assets for predicting prognosis and drug response from phenotypic, genotypic, and epigenetic data through innovative clinical trials and large-scale observational studies. However, in the rapidly growing field of real-world evidence generation, it is crucial to more critically evaluate EHR-driven studies the veracity of the data used to support the conclusions in order to prevent harm from misleading studies [15,16]. The accuracy of electronic health record data matters more than ever, especially due to the proliferation of clinical decision support, workflow systems and learning health systems.…”

Section: Ethical Legal Social Policy Issues and Solutions Stakeholder Participation And Research Networkmentioning

confidence: 99%

Key Contributions in Clinical Research Informatics

Daniel

Bellamine²

2021

Yearb Med Inform

View full text Add to dashboard Cite

Summary Objectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2020. Method: A bibliographic search using a combination of Medical Subject Headings (MeSH) descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting between two section editors and the editorial team was organized to finally conclude on the selected four best papers. Results: Among the 877 papers published in 2020 and returned by the search, there were four best papers selected. The first best paper describes a method for mining temporal sequences from clinical documents to infer disease trajectories and enhancing high-throughput phenotyping. The authors of the second best paper demonstrate that the generation of synthetic Electronic Health Record (EHR) data through Generative Adversarial Networks (GANs) could be substantially improved by more appropriate training and evaluation criteria. The third best paper offers an efficient advance on methods to detect adverse drug events by computer-assisting expert reviewers with annotated candidate mentions in clinical documents. The large-scale data quality assessment study reported by the fourth best paper has clinical research informatics implications, in terms of the trustworthiness of inferences made from analysing electronic health records. Conclusions: The most significant research efforts in the CRI field are currently focusing on data science with active research in the development and evaluation of Artificial Intelligence/Machine Learning (AI/ML) algorithms based on ever more intensive use of real-world data and especially EHR real or synthetic data. A major lesson that the coronavirus disease 2019 (COVID-19) pandemic has already taught the scientific CRI community is that timely international high-quality data-sharing and collaborative data analysis is absolutely vital to inform policy decisions.

show abstract

What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask

Cited by 90 publications

References 26 publications

A Systematic Approach to Reconciling Data Quality Failures: Investigation Using Spinal Cord Injury Data

A Systematic Approach to Reconciling Data Quality Failures: Investigation Using Spinal Cord Injury Data

Transfer of Clinical Drug Data to a Research Infrastructure on OMOP – A FAIR Concept

Key Contributions in Clinical Research Informatics

Contact Info

Product

Resources

About