ObjectivesUK statistics suggest only two-thirds of patients with dementia get a diagnosis recorded in primary care. General practitioners (GPs) report barriers to formally diagnosing dementia, so some patients may be known by GPs to have dementia but may be missing a diagnosis in their patient record. We aimed to produce a method to identify these âknown but unlabelledâ patients with dementia using data from primary care patient records.DesignRetrospective caseâcontrol study using routinely collected primary care patient records from Clinical Practice Research Datalink.SettingUK general practice.ParticipantsEnglish patients aged >65âyears, with a coded diagnosis of dementia recorded in 2000â2012 (cases), matched 1:1 with patients with no diagnosis code for dementia (controls).InterventionsEight coded and nine keyword concepts indicating symptoms, screening tests, referrals and care for dementia recorded in the 5 years before diagnosis. We trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naĂŻve Bayes, random forest).Primary and secondary outcomesThe outcome variable was dementia diagnosis code; the accuracy of classifiers was assessed using area under the receiver operating characteristic curve (AUC); the order of features contributing to discrimination was examined.Results93â426 patients were included; the median age was 83 years (64.8% women). Three classifiers achieved high discrimination and performed very similarly. AUCs were 0.87â0.90 with coded variables, rising to 0.90â0.94 with keywords added. Feature prioritisation was different for each classifier; commonly prioritised features were Alzheimerâs prescription, dementia annual review, memory loss and dementia keywords.ConclusionsIt is possible to detect patients with dementia who are known to GPs but unlabelled with a diagnostic code, with a high degree of accuracy in electronic primary care record data. Using keywords from clinic notes and letters improves accuracy compared with coded data alone. This approach could improve identification of dementia cases for record-keeping, service planning and delivery of good quality care.