Objectives
To understand the landscape of privacy preserving record linkage (PPRL) applications in public health, assess estimates of PPRL accuracy and privacy, and evaluate factors for PPRL adoption.
Materials and Methods
A literature scan examined the accuracy, data privacy, and scalability of PPRL in public health. Twelve interviews with subject matter experts were conducted and coded using an inductive approach to identify factors related to PPRL adoption.
Results
PPRL has a high level of linkage quality and accuracy. PPRL linkage quality was comparable to that of clear text linkage methods (requiring direct personally identifiable information [PII]) for linkage across various settings and research questions. Accuracy of PPRL depended on several components, such as PPRL technique, and the proportion of missingness and errors in underlying data. Strategies to increase adoption include increasing understanding of PPRL, improving data owner buy-in, establishing governance structure and oversight, and developing a public health implementation strategy for PPRL.
Discussion
PPRL protects privacy by eliminating the need to share PII for linkage, but the accuracy and linkage quality depend on factors including the choice of PPRL technique and specific PII used to create encrypted identifiers. Large-scale implementations of PPRL linking millions of observations—including PCORnet, National Institutes for Health N3C, and the Centers for Disease Control and Prevention COVID-19 project have demonstrated the scalability of PPRL for public health applications.
Conclusions
Applications of PPRL in public health have demonstrated their value for the public health community. Although gaps must be addressed before wide implementation, PPRL is a promising solution to data linkage challenges faced by the public health ecosystem.