Fitness for purpose of routinely recorded health data to identify patients with complex diseases: The case of Sjögren's syndrome

Wiegersma, Sytske; Flinterman, Linda E.; Seghieri, Chiara; Baldini, Chiara; Paget, John; Cortés, Juan José Baztán; Verheij, Robert

doi:10.1002/lrh2.10242

Cited by 4 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These features are in line with reported symptoms in the literature [1][2][3]24]. However, L99 (other musculoskeletal diseases), F99 (other diseases eyes) and non-Hodgkin's disease (B72.02) were not as important as expected when compared to previous studies [4,24]. None of the three diseases above were in the feature importance top-15 for either of the models.…”

Section: Discussionsupporting

confidence: 88%

“…The prevalence of pSS varies greatly across studies, with a point estimate of 0.61‰, but ranging from 0.11-37.9‰ [3]. Currently, no separate code for pSS is present in the International Classification of Primary Care (ICPC) coding system [4]. With the growing usage of real-world data derived from administrative and clinical data, new possibilities for the earlier recognition and diagnosis of complex diseases with low prevalence like pSS arise.…”

Section: Open Accessmentioning

confidence: 99%

“…3 Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, the Netherlands. 4 Faculty of Science, Computer Science, Artificial Intelligence, Vrije Universiteit, Amsterdam, the Netherlands. 5 Institute of Management, Sant 'Anna School of Advanced Studies, Pisa, Italy.…”

Section: Abbreviationsmentioning

confidence: 99%

See 2 more Smart Citations

Detection of primary Sjögren’s syndrome in primary care: developing a classification model with the use of routine healthcare data and machine learning

et al. 2022

Self Cite

View full text Add to dashboard Cite

Background Primary Sjögren’s Syndrome (pSS) is a rare autoimmune disease that is difficult to diagnose due to a variety of clinical presentations, resulting in misdiagnosis and late referral to specialists. To improve early-stage disease recognition, this study aimed to develop an algorithm to identify possible pSS patients in primary care. We built a machine learning algorithm which was based on combined healthcare data as a first step towards a clinical decision support system. Method Routine healthcare data, consisting of primary care electronic health records (EHRs) data and hospital claims data (HCD), were linked on patient level and consisted of 1411 pSS and 929,179 non-pSS patients. Logistic regression (LR) and random forest (RF) models were used to classify patients using age, gender, diseases and symptoms, prescriptions and GP visits. Results The LR and RF models had an AUC of 0.82 and 0.84, respectively. Many actual pSS patients were found (sensitivity LR = 72.3%, RF = 70.1%), specificity was 74.0% (LR) and 77.9% (RF) and the negative predictive value was 99.9% for both models. However, most patients classified as pSS patients did not have a diagnosis of pSS in secondary care (positive predictive value LR = 0.4%, RF = 0.5%). Conclusion This is the first study to use machine learning to classify patients with pSS in primary care using GP EHR data. Our algorithm has the potential to support the early recognition of pSS in primary care and should be validated and optimized in clinical practice. To further enhance the algorithm in detecting pSS in primary care, we suggest it is improved by working with experienced clinicians.

show abstract

Section: Discussionsupporting

confidence: 88%

Section: Open Accessmentioning

confidence: 99%

See 1 more Smart Citation

Detection of primary Sjögren’s syndrome in primary care: developing a classification model with the use of routine healthcare data and machine learning

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…Determining whether a phenotyping algorithm can be applied to a dataset is not only a methodological task, but also a data quality issue and mechanisms are needed to test data sets for fitness of purpose with respect to a particular algorithm, particularly when portability occurs between health settings. Wiegersma et al 10 look into the methods for such testing on the example of Sjögren's syndrome code list‐based phenotype algorithm developed from the Dutch national primary care database and applied to a hospital insurance claim database.…”

Section: The State Of Research In Phenomics: What This Special Issue mentioning

confidence: 99%

Why does human phenomics matter today?

Ćurčin

2020

Learning Health Systems

View full text Add to dashboard Cite

show abstract

“…These clinicians are commonly the most junior members of the clinical team, usually in training, and errors or omissions in diagnoses are rarely reviewed or corrected (Nicholls et al, 2017; Tang et al, 2017). Despite this, medical records have traditionally been the clinical reference standard against which ICD-10 codes are validated [Welk and Kwong (2017); Wiegersma et al (2020);]. Furthermore as more than one recent review has pointed out (McCormick et al, 2014; Metcalfe et al, 2012; Rubbo et al, 2015), for practical reasons most validation studies are restricted to cases with an ICD-10 code for AMI.…”

Section: Introductionmentioning

confidence: 99%

Validation of acute myocardial infarction (AMI) in electronic medical records: the SPEED-EXTRACT Study

Saavedra

Morris

Tam

et al. 2020

Preprint

View full text Add to dashboard Cite

ObjectivesTo determine whether data captured in electronic medical records (eMR) is sufficient to serve as a clinical data source to make a reliable determination of ST elevation myocardial infarction (STEMI) and non-ST elevation myocardial infarction (NSTEMI) and to use these eMR derived diagnoses to validate ICD-10 codes for STEMI and NSTEMI.DesignRetrospective validation by blind chart review of a purposive sample of patients with a troponin test result, ECG record, and medical note available in the eMR.SettingTwo local health districts containing two tertiary hospitals and six referral hospitals in New South Wales, Australia.ParticipantsN = 897 adult patients who had a hs-troponin test result indicating suspected AMI.Primary outcome measuresInter-rater reliability of clinical diagnosis (κ) for ST-elevated myocardial infarction (STEMI) and Non-ST elevated myocardial infarction (NSTEMI); and sensitivity, specificity, and positive predictive value (PPV) of ICD-10 codes for STEMI and NSTEMI.ResultsThe diagnostic agreement between clinical experts was high for STEMI (κ = 0.786) but lower for NSTEMI (κ = 0.548). ICD-10 STEMI codes had moderate sensitivity (Se = 88±6.7), very high specificity (Sp = 99±0.7) and high positive predictive value (PPV = 91±6). NSTEMI ICD-10 codes were lower in each case (Se = 69±6.4, Sp = 96.0±1.5, PPV = 84±6).ConclusionsThe eMR held sufficient clinical data to reliably diagnose STEMI, producing high inter-rater agreement among our expert reviewers as well as allowing reasonably precise estimates of the accuracy of administrative ICD-10 codes. However the clinical detail held in the eMR was less sufficient to diagnose NSTEMI, indicated by a lower inter-rater agreement. Efforts should be directed towards operationalising the clinical definition of NSTEMI and improving clinical record keeping to enable an accurate description of the clinical phenotype in the eMR, and thus improve reliability of the diagnosis of NSTEMI using these data sources.Article SummaryStrengths and limitations of this studyExpert chart review provided a robust evaluation of the reliability and sufficiency of data directly extracted from the EMR for the diagnosis of AMIComputational interrogation and extraction of the eMR (via SPEED-EXTRACT) allowed us to use a wide selection for inclusion in the sample on the basis of clinical data independent of ICD-10 code, enabling the capture of missed cases (i.e., uncoded AMI) and so determine estimates for the false negative rate and sensitivityResults were necessarily based on the subset of patients with sufficient clinical data in the eMR. Inferences from this subset to the wider patient pool will be biased when the availability of records varies with diagnosisAt least two sources of uncertainty in the gold reference standard we used are indistinguishable: uncertainty due to poor clinical detail in the eMR, and uncertainty due to a weak operational definition of the diagnosis (e.g., NSTEMI).

show abstract

Fitness for purpose of routinely recorded health data to identify patients with complex diseases: The case of Sjögren's syndrome

Cited by 4 publications

References 14 publications

Detection of primary Sjögren’s syndrome in primary care: developing a classification model with the use of routine healthcare data and machine learning

Detection of primary Sjögren’s syndrome in primary care: developing a classification model with the use of routine healthcare data and machine learning

Why does human phenomics matter today?

Validation of acute myocardial infarction (AMI) in electronic medical records: the SPEED-EXTRACT Study

Contact Info

Product

Resources

About