The Hardy-Weinberg law plays an important role in the field of population genetics and often serves as a basis for genetic inference. Because of its importance, much attention has been devoted to tests of Hardy-Weinberg proportions (HWP) over the decades. It has long been recognized that large-sample goodness-of-fit tests can sometimes lead to spurious results when the sample size and/or some genotypic frequencies are small. Although a complete enumeration algorithm for the exact test has been proposed, it is not of practical use for loci with more than a few alleles due to the amount of computation required. We propose two algorithms to estimate the significance level for a test of HWP. The algorithms are easily applicable to loci with multiple alleles. Both are remarkably simple and computationally fast. Relative efficiency and merits of the two algorithms are compared. Guidelines regarding their usage are given. Numerical examples are given to illustrate the practicality of the algorithms.
Copy number variants (CNVs) are associated with many neurocognitive disorders; however, these events are typically large and the underlying causative gene is unclear. We created an expanded CNV morbidity map from 29,085 children with developmental delay versus 19,584 healthy controls, identifying 70 significant CNVs. We resequenced 26 candidate genes in 4,716 additional cases with developmental delay or autism and 2,193 controls. An integrated analysis of CNV and single-nucleotide variant (SNV) data pinpointed ten genes enriched for putative loss of function. Patient follow-up on a subset identified new clinical subtypes of pediatric disease and the genes responsible for disease-associated CNVs. This includes haploinsufficiency of SETBP1 associated with intellectual disability and loss of expressive language and truncations of ZMYND11 in patients with autism, aggression and complex neuropsychiatric features. This combined CNV and SNV approach facilitates the rapid discovery of new syndromes and neuropsychiatric disease genes despite extensive genetic heterogeneity.
SUMMARY Maximum likelihood estimates (MLEs) in autologistic models and other exponential family models for dependent data can be calculated with Markov chain Monte Carlo methods (the Metropolis algorithm or the Gibbs sampler), which simulate ergodic Markov chains having equilibrium distributions in the model. From one realization of such a Markov chain, a Monte Carlo approximant to the whole likelihood function can be constructed. The parameter value (if any) maximizing this function approximates the MLE. When no parameter point in the model maximizes the likelihood, the MLE in the closure of the exponential family may exist and can be calculated by a two‐phase algorithm, first finding the support of the MLE by linear programming and then finding the distribution within the family conditioned on the support by maximizing the likelihood for that family. These methods are illustrated by a constrained autologistic model for DNA fingerprint data. MLEs are compared with maximum pseudolikelihood estimates (MPLEs) and with maximum conditional likelihood estimates (MCLEs), neither of which produce acceptable estimates, the MPLE because it overestimates dependence, and the MCLE because conditioning removes the constraints.
Markov chain Monte Carlo (MCMC, the Metropolis-Hastings algorithm) has been used for many statistical problems including Bayesian inference, likelihood inference, and tests of significance. Though the method often works well, doubts about convergence remain in all applications. Here we propose MCMC methods distantly related to simulated annealing. Our samplers mix rapidly enough to be usable for problems in which other methods would require eons of computing time. They simulate realizations from a sequence of distributions, allowing the distribution being simulated to vary randomly over time. If the sequence of distributions is well chosen, the sampler will mix well and produce accurate answers for all the distributions. Even when there is only one distribution of interest, these annealinglike samplers may be the only known way to get a rapidly mixing sampler. These methods are essential for attacking very hard problems, which arise in areas such as statistical genetics. We illustrate the methods with an application that is much harder than any problem previously done by Markov chain Monte Carlo. It involves ancestral inference on a very large genealogy (7 generations, 2024 individuals). The problem is to find, conditional on data on living individuals, the probabilities of each individual having been a carrier of cystic fibrosis. The unconditional probabilities are easy to calculate, but exact calculation of the conditional probabilities is infeasible. Moreover, a Gibbs sampler for the problem would not mix in a reasonable time, even on the fastest imaginable computers. Our annealing-like samplers have mixing times of a few hours. We also give examples of samplers for the "witch's hat" distribution and the conditional Strauss process. The methods may also be useful for easier problems. It is a common concern about MCMC that one can never be sure that that the chain was well mixed and the answers are correct. Although we have no guaranteed convergence bounds for our methods, it does seem that annealing-like samplers are overkill in easy problems and should dispel doubts about convergence.
The abnormally high number of centrosomes found in many human tumor cells can lead directly to aneuploidy and genomic instability through the formation of multipolar mitotic spindles. To facilitate investigation of the mechanisms that control centrosome reproduction, a frog egg extract arrested in S phase of the cell cycle that supported repeated assembly of daughter centrosomes was developed. Multiple rounds of centrosome reproduction were blocked by selective inactivation of cyclin-dependent kinase 2-cyclin E (Cdk2-E) and were restored by addition of purified Cdk2-E. Confocal immunomicroscopy revealed that cyclin E was localized at the centrosome. These results demonstrate that Cdk2-E activity is required for centrosome duplication during S phase and suggest a mechanism that could coordinate centrosome reproduction with cycles of DNA synthesis and mitosis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.