Purpose
In Germany, record linkage of claims and cancer registry data is cost‐ and time‐consuming, since up until recently no unique personal identifier was available in both data sources. The aim of this study was to evaluate the feasibility and performance of a deterministic linkage procedure based on indirect personal identifiers included in the data sources.
Methods
We identified users of glucose‐lowering drugs with residence in four federal states in Northern and Southern Germany (Bavaria, Bremen, Hamburg, Lower Saxony) in the German Pharmacoepidemiological Research Database (GePaRD) and assessed colorectal and thyroid cancer cases. Cancer registries of the federal states selected all colorectal and thyroid cancer cases between 2004 and 2015. A deterministic linkage approach was performed based on indirect personal identifiers such as year of birth, sex, area of residence, type of cancer and an absolute difference between the dates of cancer diagnosis in both data sources of at most 90 days. Results were compared to a probabilistic linkage using “direct” personal identifiers (gold standard).
Results
The deterministic linkage procedure yielded a sensitivity of 71.8% for colorectal cancer and 66.6% for thyroid cancer. For thyroid cancer, the sensitivity improved when using only inpatient diagnosis to define cancer in GePaRD (71.4%). Specificity was always above 99%. Using the probabilistic linkage to define cancer cases, the risk for colorectal cancer was estimated 10 percentage points lower than when using the deterministic approach.
Conclusions
Sensitivity of the deterministic linkage approach appears to be too low to be considered as reasonable alternative to the probabilistic linkage procedure.