2020
DOI: 10.1101/2020.08.10.244293
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The MRC IEU OpenGWAS data infrastructure

Abstract: Data generated by genome-wide association studies (GWAS) are growing fast with the linkage of biobank samples to health records, and expanding capture of high-dimensional molecular phenotypes. However the utility of these efforts can only be fully realised if their complete results are collected from their heterogeneous sources and formats, harmonised and made programmatically accessible. Here we present the OpenGWAS database, an open source, open access, scalable and high-performance cloud-based data infrastr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

6
525
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 673 publications
(637 citation statements)
references
References 39 publications
6
525
0
Order By: Relevance
“…We also conducted split-sample GWAS and Mendelian randomization analysis using UK Biobank data, in which we randomly split UK Biobank into halves, and for each half conducted a GWAS for each health condition and risk factor using the MRC IEU UK Biobank GWAS pipeline. 35 The results of the two GWASs were used to create PRSs for the other half of UK Biobank avoiding sample overlap, 36 and we repeated the Mendelian randomization analysis with the two PRSs separately, then combined the two results with fixed-effect meta-analysis to give a single estimate. The split-sample analysis: (i) allowed us to analyse lifetime smoking, as this has only been generated in UK Biobank, and thus no previous GWAS could have been used to inform the PRS; (ii) allowed us to potentially increase the size and power of the GWASs, possibly improving the predictive ability of the PRSs; and (iii) guaranteed homogeneity of the GWASs and analysis populations, which removes the potential bias from using data from an external GWAS to inform the creation of the PRSs, for example, through differences in populations giving different effects of SNPs.…”
Section: Methodsmentioning
confidence: 99%
“…We also conducted split-sample GWAS and Mendelian randomization analysis using UK Biobank data, in which we randomly split UK Biobank into halves, and for each half conducted a GWAS for each health condition and risk factor using the MRC IEU UK Biobank GWAS pipeline. 35 The results of the two GWASs were used to create PRSs for the other half of UK Biobank avoiding sample overlap, 36 and we repeated the Mendelian randomization analysis with the two PRSs separately, then combined the two results with fixed-effect meta-analysis to give a single estimate. The split-sample analysis: (i) allowed us to analyse lifetime smoking, as this has only been generated in UK Biobank, and thus no previous GWAS could have been used to inform the PRS; (ii) allowed us to potentially increase the size and power of the GWASs, possibly improving the predictive ability of the PRSs; and (iii) guaranteed homogeneity of the GWASs and analysis populations, which removes the potential bias from using data from an external GWAS to inform the creation of the PRSs, for example, through differences in populations giving different effects of SNPs.…”
Section: Methodsmentioning
confidence: 99%
“…Participants from the UK Biobank were randomly allocated to one of two split halves of the genetic data. We then generated lifetime smoking scores in sample one of these two samples, and ran a GWAS with the UK Biobank pipeline, 38 following the exact method as described elsewhere. 24…”
Section: Methodsmentioning
confidence: 99%
“…[7] All GWASs were conducted using the MRC Integrative Epidemiology Unit Pipeline with a BOLT-LMM model to account for population stratification. [15] All six GWASs were adjusted for age, sex and 40 principal components.…”
Section: Methodsmentioning
confidence: 99%