Stein's method is used to obtain two theorems on multivariate normal approximation. Our main theorem, Theorem 1.2, provides a bound on the distance to normality for any non-negative random vector. Theorem 1.2 requires multivariate size bias coupling, which we discuss in studying the approximation of distributions of sums of dependent random vectors. In the univariate case, we briefly illustrate this approach for certain sums of nonlinear functions of multivariate normal variables. As a second illustration, we show that the multivariate distribution counting the number of vertices with given degrees in certain random graphs is asymptotically multivariate normal and obtain a bound on the rate of convergence. Both examples demonstrate that this approach may be suitable for situations involving non-local dependence. We also present Theorem 1.4 for sums of vectors having a local type of dependence. We apply this theorem to obtain a multivariate normal approximation for the distribution of the random p-vector, which counts the number of edges in a fixed graph both of whose vertices have the same given color when each vertex is colored by one of p colors independently. All normal approximation results presented here do not require an ordering of the summands related to the dependence structure. This is in contrast to hypotheses of classical central limit theorems and examples, which involve for example, martingale, Markov chain or various mixing assumptions.
In Drosophila, multiple lines of evidence converge in suggesting that beneficial substitutions to the genome may be common. All suffer from confounding factors, however, such that the interpretation of the evidence—in particular, conclusions about the rate and strength of beneficial substitutions—remains tentative. Here, we use genome-wide polymorphism data in D. simulans and sequenced genomes of its close relatives to construct a readily interpretable characterization of the effects of positive selection: the shape of average neutral diversity around amino acid substitutions. As expected under recurrent selective sweeps, we find a trough in diversity levels around amino acid but not around synonymous substitutions, a distinctive pattern that is not expected under alternative models. This characterization is richer than previous approaches, which relied on limited summaries of the data (e.g., the slope of a scatter plot), and relates to underlying selection parameters in a straightforward way, allowing us to make more reliable inferences about the prevalence and strength of adaptation. Specifically, we develop a coalescent-based model for the shape of the entire curve and use it to infer adaptive parameters by maximum likelihood. Our inference suggests that ∼13% of amino acid substitutions cause selective sweeps. Interestingly, it reveals two classes of beneficial fixations: a minority (approximately 3%) that appears to have had large selective effects and accounts for most of the reduction in diversity, and the remaining 10%, which seem to have had very weak selective effects. These estimates therefore help to reconcile the apparent conflict among previously published estimates of the strength of selection. More generally, our findings provide unequivocal evidence for strongly beneficial substitutions in Drosophila and illustrate how the rapidly accumulating genome-wide data can be leveraged to address enduring questions about the genetic basis of adaptation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.