2022
DOI: 10.1101/2022.08.15.503991
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Learning-based Phenotype Imputation on Population-scale Biobank Data Increases Genetic Discoveries

Abstract: Biobanks that collect deep phenotypic and genomic data across large numbers of individuals have emerged as a key resource for human genetic research. However, phenotypes acquired as part of Biobanks are often missing across many individuals, limiting the utility of these datasets. The ability to accurately impute or "fill-in" missing phenotypes is critical to harness the power of population-scale Biobank datasets. We propose AutoComplete, a deep learning-based imputation method which can accurately impute mis… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
13
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(14 citation statements)
references
References 59 publications
1
13
0
Order By: Relevance
“…Finally, we applied a new deep-learning imputation method, AutoComplete 28 , to the same phenotype matrix ( Methods ). AutoComplete improved estimated imputation accuracy for most phenotypes with >10% missingness (29/42), and increased average estimated R 2 by 2.9%.…”
Section: Resultsmentioning
confidence: 99%
“…Finally, we applied a new deep-learning imputation method, AutoComplete 28 , to the same phenotype matrix ( Methods ). AutoComplete improved estimated imputation accuracy for most phenotypes with >10% missingness (29/42), and increased average estimated R 2 by 2.9%.…”
Section: Resultsmentioning
confidence: 99%
“… 34 , 35 However, since PRS-CS, as many other model-based PRS methods, assumes linear effects of some specific SNPs on a trait, its imputed trait values are based on the estimated linear effects of the selected SNPs (which may not be truly causal or associated ones) and accordingly are not suitable for subsequent association analyses, though they are useful for prediction. Relatedly, although other methods have been proposed to impute a few missing values of a focal trait using other traits, 37 , 38 , 39 they are not suitable for our purpose of large-scale trait imputation for downstream genetic association analysis because of the loss of specificity: by definition, any genetic variants associated with a trait used to impute the focal trait are expected to be associated with the imputed focal trait, even not truly associated with the (observed) focal trait.…”
Section: Discussionmentioning
confidence: 99%
“…The STAND eligibility criteria and treatment protocol are described extensively elsewhere 29 . Briefly, participants are initially assessed using the Computerized Adaptive Testing Depression Inventory 31 (CAT-DI), an online adaptive tool that offers validated assessments of depression severity (measured on a 0-100 scale). After the initial assessment, participants are routed to appropriate treatment resources depending on depression severity: those with mild (35 ≤ CAT-DI < 65) to moderate (65 ≤ CAT-DI < 75) depression at baseline received online support with or without peer coaching 30 while those with severe depression (CAT-DI ≥ 75) received in-person care from a clinician (Materials and Methods).…”
Section: Resultsmentioning
confidence: 99%
“…In total, we obtained 1,325 features. Missing daily feature values (Sup Figure 4) were imputed using two different imputation methods, AutoComplete 31 and softImpute 32 (Materials and Methods), resulting in 29,254 days of logging events across all individuals.…”
Section: Digital Behavioral Phenotypes Capture Changes In Behaviormentioning
confidence: 99%