ObjectivesThe Italian project MATRICE aimed to assess how well cases of type 2 diabetes (T2DM), hypertension, ischaemic heart disease (IHD) and heart failure (HF) and their levels of severity can be automatically extracted from the Health Search/CSD Longitudinal Patient Database (HSD). From the medical records of the general practitioners (GP) who volunteered to participate, cases were extracted by algorithms based on diagnosis codes, keywords, drug prescriptions and results of diagnostic tests. A random sample of identified cases was validated by interviewing their GPs.SettingHSD is a database of primary care medical records. A panel of 12 GPs participated in this validation study.Participants300 patients were sampled for each disease, except for HF, where 243 patients were assessed.Outcome measuresThe positive predictive value (PPV) was assessed for the presence/absence of each condition against the GP's response to the questionnaire, and Cohen's κ was calculated for agreement on the severity level.ResultsThe PPV was 100% (99% to 100%) for T2DM and hypertension, 98% (96% to 100%) for IHD and 55% (49% to 61%) for HF. Cohen's kappa for agreement on the severity level was 0.70 for T2DM and 0.69 for hypertension and IHD.ConclusionsThis study shows that individuals with T2DM, hypertension or IHD can be validly identified in HSD by automated identification algorithms. Automatic queries for levels of severity of the same diseases compare well with the corresponding clinical definitions, but some misclassification occurs. For HF, further research is needed to refine the current algorithm.