Background
In real‐world evidence research, reliability of coding in healthcare databases dictates the accuracy of code‐based algorithms in identifying conditions such as urinary tract infection (UTI). This study evaluates the performance characteristics of code‐based algorithms to identify UTI.
Methods
Retrospective observational study of adults contained within three large U.S. administrative claims databases on or after January 1, 2010. A targeted literature review was performed to inform the development of 10 code‐based algorithms to identify UTIs consisting of combinations of diagnosis codes, antibiotic exposure for the treatment of UTIs, and/or ordering of a urinalysis or urine culture. For each database, a probabilistic gold standard was developed using PheValuator. The performance characteristics of each code‐based algorithm were assessed compared with the probabilistic gold standard.
Results
A total of 2 950 641, 1 831 405, and 2 294 929 patients meeting study criteria were identified in each database. Overall, the code‐based algorithm requiring a primary UTI diagnosis code achieved the highest positive predictive values (PPV; >93.8%) but the lowest sensitivities (<12.9%). Algorithms requiring three UTI diagnosis codes achieved similar PPV (>0.899%) and improved sensitivity (<41.6%). Algorithms requiring a single UTI diagnosis code in any position achieved the highest sensitivities (>72.1%) alongside a slight reduction in PPVs (<78.3%). All‐time prevalence estimates of UTI ranged from 21.6% to 48.6%.
Conclusions
Based on these findings, we recommend use of algorithms requiring a single UTI diagnosis code, which achieved high sensitivity and PPV. In studies where PPV is critical, we recommend code‐based algorithms requiring three UTI diagnosis codes rather than a single primary UTI diagnosis code.