BackgroundPrimary care data in the UK are widely used for cancer research, but the reliability of recording key events such as diagnoses remains uncertain. Data linkage can mitigate these uncertainties; however, researchers may avoid linkage due to high costs, tight timelines, and sample size limitations. Hence, this study aimed to assess the quality of prostate cancer (PCa) diagnoses in primary care. We utilised Clinical Practice Research Datalink (CPRD) primary care data linked to National Cancer Registration and Analysis Service (NCRAS) and Hospital Episode Statistics (HES) in England. We compared accuracy, completeness, and timing of diagnosis recording between sources to facilitate decision-making regarding data source selection for future research.MethodsIncident PCa diagnoses (2000-2016) for males aged ≥46 years recorded in at least one study data source were examined. The accuracy of a data source was estimated by the proportion of diagnoses recorded in the specific source that was also confirmed by any linked source. Completeness was estimated by identifying the proportion of all diagnoses in linked sources with a matching diagnosis in the specific source.ResultsThe study included 51,487 PCa patients from either source. CPRD demonstrated 86.9% accuracy and 68.2% completeness against NCRAS and 75.1% accuracy and 61.1% completeness against HES. Overall, CPRD showed the highest accuracy (93%) but the lowest completeness (60.7%). Diagnosis dates in CPRD were more concordant with NCRAS (90.6% within 6 months) than with HES (61.2%). Over time, accuracy and completeness improved, especially after 2004. Discrepancies in diagnosis dates revealed a median delay of 2 weeks in CPRD than NCRAS and 1 week than HES. CPRD Aurum exhibited better quality compared to GOLD.ConclusionsWhile the accuracy of PCa diagnoses in CPRD compared to linked sources was high, completeness was low. Therefore, linking to HES or NCRAS should be considered for improved case capture, acknowledging their inherent limitations.