ObjectivesElectronic Health Record (EHR) data have created unique opportunities for research. However, these data are: not curated, siloed and poorly integrated. We describe linkage of EHR data from an entire health service with government datasets to establish a linked geographic cohort within the Australian National Centre for Healthy Ageing (NCHA). ApproachResearch suitable EHR items were identified from Peninsula Health (NCHA partner) data systems based on: published research, availability and quality. Items underwent end-user Delphi processes to identify core research items (consensus=70%). Approvals were obtained from the Australian Institute of Health and Welfare (AIHW) for linkage with: Medicare, medication dispensings, Aged Care and death registry data through the AIHW spine, created using identifiers from the Medicare Consumer Directory (MCD); and from the Centre for Victorian Data Linkage for linkage to state-wide hospital data. Identifiers for local residents aged ≥60 years who attended Peninsula Health were submitted for probabilistic data linkage. ResultsDelphi participants included 10 researchers from 8 fields/departments and 13 clinicians from 11 clinical areas. To date 7 of the 11 datasets have been reviewed. N=107 potentially suitable data items were identified and 96 gained consensus for inclusion in the core dataset. Of the 49,767 Health Service users (episodes: Jan 2010-Dec May 2021) submitted for linkage, 98.4% were successfully linked to the MCD (Median age 72.2 years, 52.2% female, 1.8% regional residence). An additional 172,290 individuals living within the geographic region but not contained within the EHR dataset were identified in the MCD for linkage to the government datasets. Linkage accuracy was impacted by inaccurate/incomplete address fields (~30%) and lack of adherence to naming conventions within the EHR data. ConclusionLinking with EHR data is complex. Having an established EHR research dataset will improve the feasibility of data linkage and potential for future expansion of linkages within the NCHA. Once merged, the data will be used to underpin a range of research activities related to ageing and dementia.
IntroductionDigitalisation of Electronic Health Record (EHR) data has created unique opportunities for research. However, these data are routinely collected for operational purposes and so are not curated to the standard required for research. Harnessing such routine data at large scale allows efficient and long-term epidemiological and health services research. ObjectivesTo describe the establishment a linked EHR derived data platform in the National Centre for Healthy Ageing, Melbourne, Australia, aimed at enabling research targeting national health priority areas in ageing. MethodsOur approach incorporated: data validation, curation and warehousing to ensure quality and completeness; end-user engagement and consensus on the platform content; implementation of an artificial intelligence (AI) pipeline for extraction of text-based data items; early consumer involvement; and implementation of routine collection of patient reported outcome measures, in a multisite public health service. ResultsData for a cohort of >800,000 patients collected over a 10-year period have been curated within the platform's research data warehouse. So far 117 items have been identified as suitable for inclusion, from 11 research relevant datasets held within the health service EHR systems. Data access, extraction and release processes, guided by the Five Safes Framework, are being tested through project use-cases. A natural language processing (NLP) pipeline has been implemented and a framework for the routine collection and incorporation of patient reported outcome measures developed. ConclusionsWe highlight the importance of establishing comprehensive processes for the foundations of a data platform utilising routine data not collected for research purposes. These robust foundations will facilitate future expansion through linkages to other datasets for the efficient and cost-effective study of health related to ageing at a large scale.
ObjectivesTo develop an ethics and governance framework for the National Centre for Healthy Ageing (NCHA) data platform that supports: streamlined access to data for research; transfer of data into secure data environments; linkage with a range of external data sources and incorporation of a variety of data types. ApproachThe NCHA data platform is bringing together Electronic Health Data across an entire region for health service and clinical research. Methods used to establish the framework include: review of existing national (Australian Institute of Health and Welfare’s data governance framework) and international (Guiding principles from ISO/IEC 385051:2017) frameworks, stakeholder engagement and early piloting through use cases. End-users and executive staff (clinical, research and legal) were consulted to ensure compliance and streamlining with existing processes. A data access governance committee was formed with expertise in data access, linkage of large health data sets, ethics, health data privacy and legal policy. ResultsData governance frameworks and policies from established state registries, large clinical trials and health data sharing and linkage centres (n=7) were reviewed and a summary was presented to the committee. An existing data access and sharing agreement and principles was chosen as a template based on existing stakeholder collaborations and relevance to the two NCHA institutes (Monash University and Peninsula Health). The draft agreement and principles were modified and piloted for data access use cases (n=6). Feedback from researchers (n=3) was used to refine the framework. The committee identified that additional frameworks, such as those outlined by the Centre for Victorian Data Linkages, will be required to accommodate future data sharing and linkage activities with industry and government. ConclusionOur work highlighted the importance of developing a robust governance framework with the ability to incorporate a range of data, that was acceptable to end-users and had sufficient flexibility to incorporate future yet to be identified data types. Ongoing work will expand the framework to include additional data linkage activities.
ObjectivesPublic health service organisations use multiple patient administration and electronic health record systems. We describe the implementation of a data warehouse automation tool within the National Centre for Healthy Ageing (NCHA) data platform to operationalise a research data warehouse to optimise data quality and data provision for health services research. ApproachThe traditional data warehouse life cycle comprises repetitive manual tasks and dependency on specialist developers. Automation tools overcome most of these inefficiencies. We conducted an internal risk benefit analysis which was validated by published literature containing data warehouse optimisation and automation. Industry-based data warehouse automation tools were reviewed to align the NCHA requirements with the tool’s functionality. Tools were then shortlisted and evaluated over a six-week period: (1) automation of standard tasks; (2) data pipeline alignment with the World Health Organization’s (WHO) Data Quality Review Framework; and (3) resource dependency risk mitigation through a Proof of Concept (PoC). ResultsThe priority areas identified by the risk benefit analysis included: end-to-end data warehouse automation; auto scripting; connectivity/linkage with multiple sources, reverse/forward engineering, audit trail conformance, scalability, multiple data warehouse architectures support, automated documentation; data management including data quality; and post-subscription independence. Twenty scientific publications were included in the final literature review (10% within healthcare) and supported the majority of identified priority areas. The industry-based review identified 11 suitable data warehouse/Extract-Transform-Load (ETL) automation tools. Five tools demonstrated adequate performance for task automation, data quality management, reduced dependency on specialist developers and on-premise linkage compatibility. Two automation tools were tested each for 6 weeks through PoC development. One automation tool met 8 out of the 10 automation requirements and was selected for implementation. ConclusionData warehouse development processes are complex and time consuming. Tools that offer automation of repetitive tasks and scripting increase the consistency while reducing the dependency on specialist staff. Integrated data quality management minimises the time researchers spend in pre-processing patient level data sourced through a semi-automated data warehouse.
ObjectivesTo develop a flexible platform for creating, reviewing and adjudicating annotation of unstructured text. Natural Language Processing models and statistical classifiers use the results for analysis of large databases of text, such as electronic health records, that are curated by the National Centre for Healthy Ageing (NCHA) Data Platform. ApproachAutomated approaches are essential for large scale extraction of structured data from unstructured documents. We applied the CogStack suite to annotate clinical text from hospital inpatient records based on the Unified Medical Language System (UMLS) for classifying dementia status. We trained a logistic regression classifier to determine dementia/non-dementia status within two cohorts based on frequency of occurrence of a set of terms provided by experts - one with confirmed dementia based on clinical assessment and the other confirmed non-dementia based on telephone cognitive interview. We used our annotation platform to review the accuracy of concepts assigned by CogStack. ResultsThere were 368 people with clinically confirmed dementia and 218 screen-negative for dementia. Of these, 259 with dementia and 195 without dementia had documents in the inpatient electronic health record system, 84045 inpatient documents 16950 for the dementia and non-dementia cohort respectively. A set of key words pertaining to dementia was generated by a specialist neurologist and a health information manager, and matched to UMLS concepts. The NCHA data platform holds a copy of the inpatient text records (>13million documents) that has been annotated using CogStack. Annotated documents corresponding to the study cohort were extracted. We tested true positive rates of annotation against 50 concepts judged by a neurologist and health information manager to be relevant to dementia patients by manually review of 100 documents. ConclusionAutomated annotations must be validated. The platform we have developed allows efficient review and correction of annotations to allow models to be trained further or provide confidence that accuracy is sufficient for subsequent analysis. Implementation within our linked NCHA data platform will allow incorporation of text based data at scale.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.