2021
DOI: 10.1093/hmg/ddab261
|View full text |Cite
|
Sign up to set email alerts
|

A data harmonization pipeline to leverage external controls and boost power in GWAS

Abstract: The use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors, and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spuri… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 36 publications
0
13
0
Order By: Relevance
“…Controls should be matched to cases with the same genotyping platform, the same variant calling, quality-control metrics and analysis pipeline, and imputed together with the same variants and reference panel [ 43 ]. Close attention should be paid to possible structural differences in the data caused by different laboratories and/or study populations [ 17 20 ]. These issues are generic to borrowing controls, irrespective of whether one borrows 1 or 100 control(s)/case.…”
Section: Discussionmentioning
confidence: 99%
“…Controls should be matched to cases with the same genotyping platform, the same variant calling, quality-control metrics and analysis pipeline, and imputed together with the same variants and reference panel [ 43 ]. Close attention should be paid to possible structural differences in the data caused by different laboratories and/or study populations [ 17 20 ]. These issues are generic to borrowing controls, irrespective of whether one borrows 1 or 100 control(s)/case.…”
Section: Discussionmentioning
confidence: 99%
“…For R 2 bins in between, AF diff progressively increases as imputation quality decreases (Supp Fig 1 ). For example, the 514 EUR AFFY samples with WGS had 2.10% of SNPs (7,064,852) with AF diff ≥ 1% in S1, decreasing to 1.98% in S2. Looking at a more relaxed threshold of AF diff ≥ 2%, S1 had 0.39% of SNPs meeting this criterion whereas S2 slightly increased to 0.56% (Supp Table 2).…”
Section: Tsim Is Robust Against Imputation-derived Errorsmentioning
confidence: 99%
“…When combining cohorts, batch effects can arise due to differences in genotyping platform. To reduce these effects, the current practice for combining cohorts prior to imputation is to use genotyped single nucleotide polymorphisms (SNPs) that are shared between cohorts 1,47 . This approach not only reduces the number of SNPs available for analysis but may also adversely affect genotype imputation by decreasing accuracy.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…al. have noted similar problems where an “aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors and the use of different genotyping platforms” [ 192 ].…”
Section: Proposed Solutionsmentioning
confidence: 99%