2021
DOI: 10.1101/2021.06.27.449937
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pancreatic cancer risk predicted from disease trajectories using deep learning

Abstract: Pancreatic cancer is an aggressive disease that typically presents late with poor patient outcomes. There is a pronounced medical need for early detection of pancreatic cancer, which can be facilitated by identifying high-risk populations. Here we apply artificial intelligence (AI) methods to a large corpus of more than 6 million patient records spanning 40 years with 24,000 pancreatic cancer cases in the Danish National Patient Registry. In contrast to existing methods that do not use temporal information, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 12 publications
(17 citation statements)
references
References 59 publications
0
17
0
Order By: Relevance
“…Other researchers have used EHR data to develop PDAC risk prediction models for the general population [3,5,7,8,22]. Data set sizes range from 1,792 PDAC cases/1.8M controls [8] to 24,000 PDAC cases/6.2M controls [22].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Other researchers have used EHR data to develop PDAC risk prediction models for the general population [3,5,7,8,22]. Data set sizes range from 1,792 PDAC cases/1.8M controls [8] to 24,000 PDAC cases/6.2M controls [22].…”
Section: Related Workmentioning
confidence: 99%
“…While some studies work with data obtained from multiple organizations [7,8,22], none work with a federated data network that harmonizes and standardizes the data, none provides a clear path to clinical deployment, and none supports the seamless deployment of the model to new HCOs as they join the federated network. Some previous studies evaluate the ability of their models to identify high-risk individuals either until or shortly before the date of PDAC diagnosis [7,8,22], when clinical benefit is improbable. To focus on time frames in which detection of early stage disease and potential cure are most likely, we evaluate the ability of our models to identify high-risk patients at least six months before diagnosis.…”
Section: Related Workmentioning
confidence: 99%
“…Only a handful of research studies have used ML to build predictive models with EHR data in this field. 11 , 15 , 16 These studies have demonstrated that by leveraging AI/ML and EHRs, subpopulations at high risk for PDAC can be identified 1 to 2 years before diagnosis. Such efforts also highlight specific challenges and opportunities for improving the secondary use of EHR data with AI and innovative data science solutions.…”
Section: Electronic Health Recordsmentioning
confidence: 99%
“…The AUROC from cross-application of the model on an external dataset, however, decreased to 0.78, which addresses the limitation of model generalizability, likely due to different coding practices across different health systems. 15 The Med-BERT, a contextualized embedding model pre-trained on a structured EHR dataset of 28,490,650 patients, has shown some promise for establishing a generalizable AI model for medical/clinical applications. Med-BERT enables utilization of small local training datasets for realistic disease prediction tasks.…”
Section: Introductionmentioning
confidence: 99%