Multiple testing is a challenging issue in genetic association studies using large numbers of single nucleotide polymorphism (SNP) markers, many of which exhibit linkage disequilibrium (LD). Failure to adjust for multiple testing appropriately may produce excessive false positives or overlook true positive signals. The Bonferroni method of adjusting for multiple comparisons is easy to compute, but is well known to be conservative in the presence of LD. On the other hand, permutation-based corrections can correctly account for LD among SNPs, but are computationally intensive. In this work, we propose a new multiple testing correction method for association studies using SNP markers. We show that it is simple, fast and more accurate than the recently developed methods and is comparable to permutation-based corrections using both simulated and real data. We also demonstrate how it might be used in whole-genome association studies to control type I error. The efficiency and accuracy of the proposed method make it an attractive choice for multiple testing adjustment when there is high intermarker LD in the SNP data set. Genet. Epidemiol. 32:361-369, 2008.r 2008 Wiley-Liss, Inc.Key words: single nucleotide polymorphism; composite linkage disequilibrium; multiple testing correction; principal component analysis; eigenvalues
INTRODUCTIONMultiple testing is a challenging issue for genetic data analysis. Candidate gene and genome-wide association studies involve statistical testing of not just a single hypothesis, but many. Even when the point-wise error rate (PWER, a p ) is set to a low level, the experiment-wise error rate (EWER, a e ) increases with the number of tests carried out. For this reason, strict significance thresholds have been recommended to control EWER [Risch and Merikangas, 1996]. However, an overly conservative approach may result in overlooking true positive signals, while an overly liberal criterion could produce excessive false positives. Šidák and Bonferroni corrections are popular approaches for controlling a e by specifying what a p values should be used for each individual test. The Šidák correction is calculated as a p ¼ 1 À ð1 À a e Þ 1=N , where N is the number of individual hypotheses to be tested [Šidák, 1967]. This correction assumes that the hypothesis tests are independent. Noting that ð1 À a p Þ N % 1 À Na p for small a p , we obtain the Bonferroni correction as Bonferroni, 1935Bonferroni, , 1936, which is an approximation to the Šidák correction.Recently, single nucleotide polymorphisms (SNPs), which are often densely genotyped, have become popular markers for genetic association studies. The closely spaced SNPs frequently yield high correlation because of extensive linkage disequilibrium (LD) among them [Wall and Pritchard, 2003]. Therefore, when association studies are conducted with many SNPs, the tests performed on each SNP are usually not independent, depending on the correlation structure among the SNPs. This violation of the independence assumption limits the Šidák and Bonferron...