Purpose: Our aim was to develop and validate a practical US healthcare claims algorithm for identifying incident lung cancer that improves on positive predictive value (PPV) and sensitivity observed in past studies. Methods: Patients newly diagnosed with lung cancer in Surveillance, Epidemiology, and End Results (SEER) (gold standard) were linked with Medicare claims. A 5% Medicare "other cancer" sample and noncancer sample served as controls. A split-sample validation approach was used. Rules-based, regression, and machine learning models for developing algorithms were explored. Algorithms were developed in the model building subset. Rules-based algorithms and those with the highest F scores were evaluated in the validation subset. F scores were compared for 1000 bootstrap samples. Misclassification was evaluated by calculating the odds of selection by the algorithm among true positives and true negatives. Results: A practical single-score algorithm derived from a logistic regression model had sensitivity = 78.22% and PPV = 78.50% (F score: 78.36). The algorithm was most likely to misclassify older patients (ages ≥80 years) or with missing data in the SEER registry, shorter follow-up time in Medicare (<3 months), insurance through Veterans Affairs, >1 cancer in SEER, or certain Charlson comorbidities (dementia, chronic pulmonary disease, liver disease, or myocardial infarction). Conclusion: In this dataset, a practical point-based algorithm for identifying incident lung cancer demonstrated significant and substantial improvement (7.9% and 23.9% absolute improvement in sensitivity and PPV, respectively) compared with a current standard.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.