ObjectivesUK statistics suggest only two-thirds of patients with dementia get a diagnosis recorded in primary care. General practitioners (GPs) report barriers to formally diagnosing dementia, so some patients may be known by GPs to have dementia but may be missing a diagnosis in their patient record. We aimed to produce a method to identify these ‘known but unlabelled’ patients with dementia using data from primary care patient records.DesignRetrospective case–control study using routinely collected primary care patient records from Clinical Practice Research Datalink.SettingUK general practice.ParticipantsEnglish patients aged >65 years, with a coded diagnosis of dementia recorded in 2000–2012 (cases), matched 1:1 with patients with no diagnosis code for dementia (controls).InterventionsEight coded and nine keyword concepts indicating symptoms, screening tests, referrals and care for dementia recorded in the 5 years before diagnosis. We trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, random forest).Primary and secondary outcomesThe outcome variable was dementia diagnosis code; the accuracy of classifiers was assessed using area under the receiver operating characteristic curve (AUC); the order of features contributing to discrimination was examined.Results93 426 patients were included; the median age was 83 years (64.8% women). Three classifiers achieved high discrimination and performed very similarly. AUCs were 0.87–0.90 with coded variables, rising to 0.90–0.94 with keywords added. Feature prioritisation was different for each classifier; commonly prioritised features were Alzheimer’s prescription, dementia annual review, memory loss and dementia keywords.ConclusionsIt is possible to detect patients with dementia who are known to GPs but unlabelled with a diagnostic code, with a high degree of accuracy in electronic primary care record data. Using keywords from clinic notes and letters improves accuracy compared with coded data alone. This approach could improve identification of dementia cases for record-keeping, service planning and delivery of good quality care.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.