2020
DOI: 10.1002/pds.5137
|View full text |Cite
|
Sign up to set email alerts
|

Development and validation of coding algorithms to identify patients with incident lung cancer in United States healthcare claims data

Abstract: Purpose: Our aim was to develop and validate a practical US healthcare claims algorithm for identifying incident lung cancer that improves on positive predictive value (PPV) and sensitivity observed in past studies. Methods: Patients newly diagnosed with lung cancer in Surveillance, Epidemiology, and End Results (SEER) (gold standard) were linked with Medicare claims. A 5% Medicare "other cancer" sample and noncancer sample served as controls. A split-sample validation approach was used. Rules-based, regressio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 19 publications
0
11
0
Order By: Relevance
“…However, algorithms to identify incident cases exist for many other cancer types. [16][17][18][19][20][21][22][23] Our algorithm is similar to proposed by Nattinger et al 16 to identify incident cases of breast cancer, which likewise used an initial diagnosis code to identify breast cancer, a secondary step to identify cases with high likelihood of being true breast cancer (using procedural codes instead of chemotherapy and surgical claims), excluded nonbreast cancer cases, and removed prevalent cancer cases. Other algorithms to identify incident cancer cases use logistic regression models to derive predictor variables 19,21 or require various elements of inpatient and outpatient diagnosis codes, procedures, and/or laboratory values within defined time points.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, algorithms to identify incident cases exist for many other cancer types. [16][17][18][19][20][21][22][23] Our algorithm is similar to proposed by Nattinger et al 16 to identify incident cases of breast cancer, which likewise used an initial diagnosis code to identify breast cancer, a secondary step to identify cases with high likelihood of being true breast cancer (using procedural codes instead of chemotherapy and surgical claims), excluded nonbreast cancer cases, and removed prevalent cancer cases. Other algorithms to identify incident cancer cases use logistic regression models to derive predictor variables 19,21 or require various elements of inpatient and outpatient diagnosis codes, procedures, and/or laboratory values within defined time points.…”
Section: Discussionmentioning
confidence: 99%
“…In particular, there is the potential for inaccurate or missing data and a paucity of validated clinical and pathologic data, which creates challenges for accurate cohort selection for research purposes. 5,14,15 Although previous studies have developed algorithms using claims data to identify cases of multiple other types of cancers, [16][17][18][19][20][21][22][23][24] currently, there is no algorithm to identify incident ovarian cancer cases. As research into ovarian cancer using administrative databases continues, it is imperative that these cases can be correctly ascertained from claims databases to make accurate research inferences and conclusions.…”
Section: Introductionmentioning
confidence: 99%
“…Full details of the data source used in this study were published previously. 15 The protocol was reviewed and considered exempt by the Quorum Review institutional review board prior to approval by the National Cancer Institute for SEER-Medicare data use.…”
Section: Methodsmentioning
confidence: 99%
“…We searched for candidate variables in the pre- and post-index periods in Medicare claims. Based on the algorithms that performed best in our previous study, 15 the following models were explored in this study: a single logistic regression model, a single logistic regression model with interactions, gradient boosting, and neural networks. The multilayer perceptron neural network had two hidden layers in the network and two neurons in each hidden layer because five hidden nodes did not improve the F-score in the model building set.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation