2023
DOI: 10.1101/2023.02.05.23285192
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Developing and validating a pancreatic cancer risk model for the general population using multi-institutional electronic health records from a federated network

Abstract: Purpose: Pancreatic Duct Adenocarcinoma (PDAC) screening can enable detection of early-stage disease and long-term survival. Current guidelines are based on inherited predisposition; only about 10% of PDAC cases meet screening eligibility criteria. Electronic Health Record (EHR) risk models for the general population hold out the promise of identifying a high-risk cohort to expand the currently screened population. Using EHR data from a multi-institutional federated network, we developed and validated a PDAC r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…Based on a determination by the Western IRB, studies using TriNetX data are not considered to be human subject research, and are therefore exempt from IRB review. We adopted a methodology consistent with our prior study [27], with modifications tailored to the specific objectives of the current study. Briefly, we trained two classes of models: Neural Networks (NN) and Logistic Regression (LR), employed our feature selection algorithms to improve interpretability, conducted three types of internal-external validation, and simulated model deployment in a prospective setup.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Based on a determination by the Western IRB, studies using TriNetX data are not considered to be human subject research, and are therefore exempt from IRB review. We adopted a methodology consistent with our prior study [27], with modifications tailored to the specific objectives of the current study. Briefly, we trained two classes of models: Neural Networks (NN) and Logistic Regression (LR), employed our feature selection algorithms to improve interpretability, conducted three types of internal-external validation, and simulated model deployment in a prospective setup.…”
Section: Methodsmentioning
confidence: 99%
“…To limit the number of features, we discarded entry types that occurred in fewer than 1% of HCC cases in the training set. However, doing so still resulted in thousands of features, so we further employed L 0 regularization on binary input mask [30] and iterative feature removal [27].…”
Section: Feature Extractionmentioning
confidence: 99%