2021
DOI: 10.1371/journal.pone.0249305
|View full text |Cite
|
Sign up to set email alerts
|

Impact of variant-level batch effects on identification of genetic risk factors in large sequencing studies

Abstract: Genetic studies have shifted to sequencing-based rare variants discovery after decades of success in identifying common disease variants by Genome-Wide Association Studies using Single Nucleotide Polymorphism chips. Sequencing-based studies require large sample sizes for statistical power and therefore often inadvertently introduce batch effects because samples are typically collected, processed, and sequenced at multiple centers. Conventionally, batch effects are first detected and visualized using Principal … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

1
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 35 publications
(42 reference statements)
1
5
0
Order By: Relevance
“…18 In our exploratory analyses of these data, we observed many variants that displayed large variation in allele frequencies across centers/ platforms and contributed to spurious association signals with AD risk, that is, associations that passed at least the common suggestive significance for genome-wide association studies (p < 1 × 10 −5 ) but were of a (likely) artifactual nature. Similar to the prior study, 19 we also observed that platform/center adjustment could not fully account for these signals.…”
supporting
confidence: 76%
See 3 more Smart Citations
“…18 In our exploratory analyses of these data, we observed many variants that displayed large variation in allele frequencies across centers/ platforms and contributed to spurious association signals with AD risk, that is, associations that passed at least the common suggestive significance for genome-wide association studies (p < 1 × 10 −5 ) but were of a (likely) artifactual nature. Similar to the prior study, 19 we also observed that platform/center adjustment could not fully account for these signals.…”
supporting
confidence: 76%
“…Researchers may thus inspect filtered variants in targeted analyses in subsets of the ADSP data where no artifactual genotype enrichment is observed (e.g., excluding a single sequencing center/ platform that showed an artifactual increase in genotype counts compared with the others, cf. Wickland et al 19 ).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…In the present study, WGS data were analyzed to characterize the entire genome to identify early-onset AMI-specific markers, revealing 86% loci and 77% variants associated with early AMI that were not covered in SNP chips (Supplementary Material, Table S2 and S3). Moreover, the variants were strictly filtered to remove false positives caused by the batch effect which can lead to false disease association (36, 37). A batch effect was confirmed according to the sequencing year, and it was not successfully filtered by common variant filtering criteria such as missing genotype rate, minor allele frequency, and Hardy-Weinberg equilibrium (Supplementary Material, Fig.…”
Section: Discussionmentioning
confidence: 99%