Objective
This study sought both to support evidence-based patient identity policy development by illustrating an approach for formally evaluating operational matching methods, and also to characterize the performance of both referential and probabilistic patient matching algorithms using real-world demographic data.
Materials and Methods
We assessed matching accuracy for referential and probabilistic matching algorithms using a manually reviewed 30 000 record gold standard reference dataset derived from a large health information exchange containing over 47 million patient registrations. We applied referential and probabilistic algorithms to this dataset and compared the outputs to the gold standard. We computed performance metrics including sensitivity (recall), positive predictive value (precision), and F-score for each algorithm.
Results
The probabilistic algorithm exhibited sensitivity, positive predictive value (PPV), and F-score of .6366, 0.9995, and 0.7778, respectively. The referential algorithm exhibited corresponding sensitivity, PPV, and F-score values of 0.9351, 0.9996, and 0.9663, respectively. Treating discordant and limited-data records as nonmatches increased referential match sensitivity to 0.9578. Compared to the more traditional probabilistic approach, referential matching exhibits greater accuracy.
Conclusions
Referential patient matching, an increasingly popular method among health IT vendors, demonstrated notably greater accuracy than a more traditional probabilistic approach without the adaptation of the algorithm to the data that the traditional probabilistic approach usually requires. Health IT policymakers, including the Office of the National Coordinator for Health Information Technology (ONC), should explore strategies to expand the evidence base for real-world matching system performance, given the need for an evidence-based patient identity strategy.