2020
DOI: 10.1093/gigascience/giaa077
|View full text |Cite
|
Sign up to set email alerts
|

VariantSpark: Cloud-based machine learning for association study of complex phenotype and large-scale genomic data

Abstract: Background Many traits and diseases are thought to be driven by >1 gene (polygenic). Polygenic risk scores (PRS) hence expand on genome-wide association studies by taking multiple genes into account when risk models are built. However, PRS only considers the additive effect of individual genes but not epistatic interactions or the combination of individual and interacting drivers. While evidence of epistatic interactions ais found in small datasets, large datasets have not been process… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
27
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(27 citation statements)
references
References 35 publications
0
27
0
Order By: Relevance
“…See Bayat et al (2020), where we introduced VariantSpark, a tailored Hadoop/Spark-based implementation of random forests, designed to handle data with extremely large number of variables per sample (O’Brien et al, 2018). In a test of the algorithm, VariantSpark was used to accurately predict ethnicity from whole genome sequencing profiles of 2,500 individuals and 80 million genomic variants each (1000 Genomes Project).…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…See Bayat et al (2020), where we introduced VariantSpark, a tailored Hadoop/Spark-based implementation of random forests, designed to handle data with extremely large number of variables per sample (O’Brien et al, 2018). In a test of the algorithm, VariantSpark was used to accurately predict ethnicity from whole genome sequencing profiles of 2,500 individuals and 80 million genomic variants each (1000 Genomes Project).…”
Section: Methodsmentioning
confidence: 99%
“…In conjunction with RF implementations for high-dimensional data, e.g. VariantSpark Bayat et al (2020), Random Forest can deliver on its promise of offering higher performance for GWAS-approaches than other multi-locus models (lasso and the elastic net, (Friedman et al, 2010)) and single loci mapping methods (Michaelson et al, 2010).…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations