The mid p-value in exact tests for Hardy-Weinberg equilibrium

BackgroundPLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1’s primary data format.FindingsTo address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, -time/constant-space Hardy-Weinberg equilibrium and Fisher’s exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0).ConclusionsThe second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.Electronic supplementary materialThe online version of this article (doi:10.1186/s13742-015-0047-8) contains supplementary material, which is available to authorized users.

show abstract

“…Due to recent calls for use of mid- p adjustments in biostatistics [19,20], all of these functions have mid- p modes, and PLINK 1.9 exposes them.…”

Section: Resultsmentioning

confidence: 99%

Second-generation PLINK: rising to the challenge of larger and richer datasets

et al. 2015

View full text Add to dashboard Cite

show abstract

“…Statistical tests for HWP with rare variants have low power (Emigh 1980; Wigginton et al. 2005; Graffelman and Moreno 2013) and thus there is less power to detect HWD in the African sample, and it is thus unsurprising that fewer significant results are observed in the YRI sample. Because the distribution of the minor allele frequency is different in each sample, the rates of significant variants are incommensurable.…”

Section: Discussionmentioning

confidence: 99%

“…2005). We therefore used the exact mid p value (Graffelman and Moreno 2013), now also available in the Plink program (Purcell et al. 2007), which has expectation 0.5 under the null, and, more importantly, provides for a test that has its rejection rate close to the nominal level.…”

Section: Methodsmentioning

confidence: 99%

A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data

2017

Self Cite

View full text Add to dashboard Cite

Statistical tests for Hardy–Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test procedures for autosomal and X-chromosomal variants. We find that the rate of disequilibrium largely exceeds what might be expected by chance alone for all chromosomes. Observed disequilibrium is, in about 60% of the cases, due to heterozygote excess. We suggest that most excess disequilibrium can be explained by sequencing problems, and hypothesize mechanisms that can explain exceptional heterozygosities. We report higher rates of disequilibrium for the MHC region on chromosome 6, regions flanking centromeres and p-arms of acrocentric chromosomes. We also detected long-range haplotypes and areas with incidental high disequilibrium. We report disequilibrium to be related to read depth, with variants having extreme read depths being more likely to be out of equilibrium. Disequilibrium rates were found to be 11 times higher in segmental duplications and simple tandem repeat regions. The variants with significant disequilibrium are seen to be concentrated in these areas. For next generation sequence data, Hardy–Weinberg disequilibrium seems to be a major indicator for copy number variation.Electronic supplementary materialThe online version of this article (doi:10.1007/s00439-017-1786-7) contains supplementary material, which is available to authorized users.

show abstract

“…Genotype determinations were performed blind to psychopathological status of the twin pairs. Departure from Hardy-Weinberg equilibrium was tested in both the whole sample (115 pairs) and the depression concordant, discordant and control subsets of twins (6, 11 and 10 pairs) by using one genotype from every pair, and following a recently introduced methodology that is particularly suited for small sample sizes with low minor allele counts (Graffelman and Moreno 2013). The genotype distribution of the rs1360780 SNP was in Hardy-Weinberg equilibrium in all four cases; the p-values for equilibrium departure were 0.921 (whole UB sample), 0.136 (concordant), 0.068 (discordant) and 0.14 (healthy).…”

Section: Methodsmentioning

confidence: 99%

FKBP5 modulates the hippocampal connectivity deficits in depression: a study in twins

Córdova‐Palomera

Reus

Fatjó‐Vilas

et al. 2016

Brain Imaging and Behavior

View full text Add to dashboard Cite

The hippocampus is a key modulator of stress responses underlying depressive behavior. While FKBP5 has been found associated with a large number of stress-related outcomes and hippocampal features, its potential role in modifying the hippocampal communication transfer mechanisms with other brain regions remains largely unexplored. The putative genetic or environmental roots of the association between depression and structural connectivity alterations of the hippocampus were evaluated combining diffusion weighted imaging with both a quantitative genetics approach and molecular information on the rs1360780 single nucleotide polymorphism, in a sample of 54 informative monozygotic twins (27 pairs). Three main results were derived from the present analyses. First, graph-theoretical measures of hippocampal connectivity were altered in depression. Specifically, decreased connectivity strength and increased network centrality of the right hippocampus were found in depressed individuals. Second, these hippocampal alterations are potentially driven by familial factors (genes plus shared environment). Third, there is an additive interaction effect between FKBP5's rs1360780 variant and the graph-theoretical metrics of hippocampal connectivity to influence depression risk. Our data reveals alterations of the communication patterns between the hippocampus and the rest of the brain in depression, effects potentially driven by overall familial factors (genes plus shared twin environment) and modified by the FKBP5 gene.

show abstract

The mid p-value in exact tests for Hardy-Weinberg equilibrium

Abstract: The standard exact p-value is overly conservative, in particular for small minor allele frequencies. The mid p-value ameliorates this problem by bringing the rejection rate closer to the nominal level, at the price of occasionally exceeding the nominal level.

Cited by 87 publications

References 32 publications

Second-generation PLINK: rising to the challenge of larger and richer datasets

Second-generation PLINK: rising to the challenge of larger and richer datasets

A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data

FKBP5 modulates the hippocampal connectivity deficits in depression: a study in twins

Contact Info

Product

Resources

About