Linkage of population-based administrative data is a valuable tool for combining detailed individual-level information from different sources for research. While not a substitute for classical studies based on primary data collection, analyses of linked administrative data can answer questions that require large sample sizes or detailed data on hard-to-reach populations, and generate evidence with a high level of external validity and applicability for policy making. There are unique challenges in the appropriate research use of linked administrative data, for example with respect to bias from linkage errors where records cannot be linked or are linked together incorrectly. For confidentiality and other reasons, the separation of data linkage processes and analysis of linked data is generally regarded as best practice. However, the ‘black box’ of data linkage can make it difficult for researchers to judge the reliability of the resulting linked data for their required purposes. This article aims to provide an overview of challenges in linking administrative data for research. We aim to increase understanding of the implications of (i) the data linkage environment and privacy preservation; (ii) the linkage process itself (including data preparation, and deterministic and probabilistic linkage methods) and (iii) linkage quality and potential bias in linked data. We draw on examples from a number of countries to illustrate a range of approaches for data linkage in different contexts.
BackgroundOntario, the most populous province in Canada, has a universal healthcare system that routinely collects health administrative data on its 13 million legal residents that is used for health research. Record linkage has become a vital tool for this research by enriching this data with the Immigration, Refugees and Citizenship Canada Permanent Resident (IRCC-PR) database and the Office of the Registrar General’s Vital Statistics-Death (ORG-VSD) registry. Our objectives were to estimate linkage rates and compare characteristics of individuals in the linked versus unlinked files.MethodsWe used both deterministic and probabilistic linkage methods to link the IRCC-PR database (1985–2012) and ORG-VSD registry (1990–2012) to the Ontario’s Registered Persons Database. Linkage rates were estimated and standardized differences were used to assess differences in socio-demographic and other characteristics between the linked and unlinked records.ResultsThe overall linkage rates for the IRCC-PR database and ORG-VSD registry were 86.4 and 96.2 %, respectively. The majority (68.2 %) of the record linkages in IRCC-PR were achieved after three deterministic passes, 18.2 % were linked probabilistically, and 13.6 % were unlinked. Similarly the majority (79.8 %) of the record linkages in the ORG-VSD were linked using deterministic record linkage, 16.3 % were linked after probabilistic and manual review, and 3.9 % were unlinked. Unlinked and linked files were similar for most characteristics, such as age and marital status for IRCC-PR and sex and most causes of death for ORG-VSD. However, lower linkage rates were observed among people born in East Asia (78 %) in the IRCC-PR database and certain causes of death in the ORG-VSD registry, namely perinatal conditions (61.3 %) and congenital anomalies (81.3 %).ConclusionsThe linkages of immigration and vital statistics data to existing population-based healthcare data in Ontario, Canada will enable many novel cross-sectional and longitudinal studies to be conducted. Analytic techniques to account for sub-optimal linkage rates may be required in studies of certain ethnic groups or certain causes of death among children and infants.Electronic supplementary materialThe online version of this article (doi:10.1186/s12911-016-0375-3) contains supplementary material, which is available to authorized users.
Fractures at typical osteoporotic sites are associated with increased mortality across all age groups, particularly in men. Better understanding of factors associated with increased post-fracture mortality should inform the development of management strategies.
BackgroundPopulation-based administrative data have been used to study osteoporosis-related fracture risk factors and outcomes, but there has been limited research about the validity of these data for ascertaining fracture cases. The objectives of this study were to: (a) compare fracture incidence estimates from administrative data with estimates from population-based clinically-validated data, and (b) test for differences in incidence estimates from multiple administrative data case definitions.MethodsThirty-five case definitions for incident fractures of the hip, wrist, humerus, and clinical vertebrae were constructed using diagnosis codes in hospital data and diagnosis and service codes in physician billing data from Manitoba, Canada. Clinically-validated fractures were identified from the Canadian Multicentre Osteoporosis Study (CaMos). Generalized linear models were used to test for differences in incidence estimates.ResultsFor hip fracture, sex-specific differences were observed in the magnitude of under- and over-ascertainment of administrative data case definitions when compared with CaMos data. The length of the fracture-free period to ascertain incident cases had a variable effect on over-ascertainment across fracture sites, as did the use of imaging, fixation, or repair service codes. Case definitions based on hospital data resulted in under-ascertainment of incident clinical vertebral fractures. There were no significant differences in trend estimates for wrist, humerus, and clinical vertebral case definitions.ConclusionsThe validity of administrative data for estimating fracture incidence depends on the site and features of the case definition.
Despite increased attention to gaps in osteoporosis management post-fracture in the last 10 years, the situation has not improved: in 2007/20008, fewer than 20% of untreated individuals with a low-trauma fracture received intervention. Novel strategies are required to disseminate and implement best practices at the point of care to reduce the risk of recurrent fractures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.