2020
DOI: 10.1101/2020.01.22.915397
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Medical data and machine learning improve power of stroke genome-wide association studies

Abstract: Genome-wide association studies (GWAS) may require enrollment of up to millions of participants to power variant discovery. This requires manual curation of cases and controls with large-scale collaborations. Biobanks connected to electronic health records (EHR) can facilitate these studies by using data from clinical care systems, like billing diagnosis codes, as phenotypes. These systems, however, do not de ne adjudicated cases and controls. Machine learning can add nuance to these de nitions. We developed Q… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 57 publications
(78 reference statements)
0
3
0
Order By: Relevance
“…Future work that uses indGWAS should attempt to maintain the cohort as much as possible. An applicable method for this challenge is QTPhenProxy, in which the phenotype definition is replaced by an estimated phenotype probability using regression on other variables [14]. For the phecode case/control example, a regression could be fit on case/control definitions before being evaluated on the whole cohort.…”
Section: Discussionmentioning
confidence: 99%
“…Future work that uses indGWAS should attempt to maintain the cohort as much as possible. An applicable method for this challenge is QTPhenProxy, in which the phenotype definition is replaced by an estimated phenotype probability using regression on other variables [14]. For the phecode case/control example, a regression could be fit on case/control definitions before being evaluated on the whole cohort.…”
Section: Discussionmentioning
confidence: 99%
“…Although previous studies have identified thousands of disease-associated loci, researchers have yet to deeply explore the joint contribution of family history, clinical measures, and lifestyle factors to model disease liabilities. Machine learning techniques have advanced in recent years, allowing to reveal patterns in massive, high-dimensional datasets and capturing non-linear relationships 18,19 . Our study seeks to improve genetic prediction with deep learning-based estimates of disease liability probabilities that leverage joint semantic and structure-based embeddings of phenotypes in the UKBB.…”
Section: Introductionmentioning
confidence: 99%
“…'& ), consistent with the known pathophysiology of disorder. Interestingly, there was a significant interaction effect between the cryptic phenotype PGS and P/LP carrier status (bPGSxP/LP=0 50.…”
mentioning
confidence: 99%