2022
DOI: 10.1055/a-1910-4154
|View full text |Cite
|
Sign up to set email alerts
|

Real-World Matching Performance of Deidentified Record-Linking Tokens

Abstract: Objective: To evaluate tokens commonly used by clinical research consortia to aggregate clinical data across institutions. Materials and Methods: This study compares tokens alone and token-based matching algorithms against manual annotation for 20,002 record pairs extracted from University of Texas Houston (UTH)’s clinical data warehouse in terms of entity resolution. Results: The highest precision achieved was 99.9% with a token derived from the first name, last name, gender, date-of-birth. The highest rec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(15 citation statements)
references
References 10 publications
0
15
0
Order By: Relevance
“…The use of deidentified tokens for record matching across research consortia and between identified research databases and anonymized public databases has been growing [ 18 ], and a recent study reported 99% precision for matching among 20,002 record pairs when first name, last name, gender, and date of birth were tokenized [ 19 ]. However, as this analysis employed a novel use of artificial intelligence capabilities on both commercial and CMS payor databases, namely Medicaid claims, it has inherent limitations.…”
Section: Discussionmentioning
confidence: 99%
“…The use of deidentified tokens for record matching across research consortia and between identified research databases and anonymized public databases has been growing [ 18 ], and a recent study reported 99% precision for matching among 20,002 record pairs when first name, last name, gender, and date of birth were tokenized [ 19 ]. However, as this analysis employed a novel use of artificial intelligence capabilities on both commercial and CMS payor databases, namely Medicaid claims, it has inherent limitations.…”
Section: Discussionmentioning
confidence: 99%
“…ENRGY is a national rheumatology practice–based research network that includes more than 700 community rheumatologists throughout the United States. The EHR data were linked to Medicare pharmacy and medical claims using unique person‐specific identifiers (eg, Medicare Beneficiary Identifiers) and to commercial pharmacy and medical claims data (Optum) using third‐party software (Datavant) via deterministic linkage using name, sex, date of birth, and at least one matching physician office visit date 6 . The study period included EHR records and claims data from 2007 through 2020.…”
Section: Methodsmentioning
confidence: 99%
“…The EHR data were linked to Medicare pharmacy and medical claims using unique person-specific identifiers (eg, Medicare Beneficiary Identifiers) and to commercial pharmacy and medical claims data (Optum) using third-party software (Datavant) via deterministic linkage using name, sex, date of birth, and at least one matching physician office visit date. 6 The study period included EHR records and claims data from 2007 through 2020. Medications were normalized between data sources using RxNorm medication ontology, one of the vocabularies represented in the Unified Medical Language System.…”
Section: Methodsmentioning
confidence: 99%
“…Tokenization assigns hashes and tokens to a group of identifier fields to encrypt a data file that contains PHI before it is shared with another organization and linked to a larger dataset. 84 The data linkage would then occur by comparing the hashes or tokens, which must match exactly, between the linked files. 85 Through the replacement of individual private health information with random inputs as tokens, the ability to link individual data across datasets (ie, public health data) to gain greater analytical insights and maximize privacy and reverse identification of patients can be achieved.…”
Section: Future State Of Phi and Public Health Informaticsmentioning
confidence: 99%