We developed a linear-time algorithm applicable to a large class of trait evolution models, for efficient likelihood calculations and parameter inference on very large trees. Our algorithm solves the traditional computational burden associated with two key terms, namely the determinant of the phylogenetic covariance matrix V and quadratic products involving the inverse of V. Applications include Gaussian models such as Brownian motion-derived models like Pagel's lambda, kappa, delta, and the early-burst model; Ornstein-Uhlenbeck models to account for natural selection with possibly varying selection parameters along the tree; as well as non-Gaussian models such as phylogenetic logistic regression, phylogenetic Poisson regression, and phylogenetic generalized linear mixed models. Outside of phylogenetic regression, our algorithm also applies to phylogenetic principal component analysis, phylogenetic discriminant analysis or phylogenetic prediction. The computational gain opens up new avenues for complex models or extensive resampling procedures on very large trees. We identify the class of models that our algorithm can handle as all models whose covariance matrix has a 3-point structure. We further show that this structure uniquely identifies a rooted tree whose branch lengths parametrize the trait covariance matrix, which acts as a similarity matrix. The new algorithm is implemented in the R package phylolm, including functions for phylogenetic linear regression and phylogenetic logistic regression.
Summary1. For the study of macroevolution, phenotypic data are analysed across species on a dated phylogeny using phylogenetic comparative methods. In this context, the Ornstein-Uhlenbeck (OU) process is now being used extensively to model selectively driven trait evolution, whereby a trait is attracted to a selection optimum l. 2. We report here theoretical properties of the maximum-likelihood (ML) estimators for these parameters, including their non-uniqueness and inaccuracy, and show that theoretical expectations indeed apply to real trees. We provide necessary conditions for ML estimators to be well defined and practical implications for model parametrization. 3. We then show how these limitations carry over to difficulties in detecting shifts in selection regimes along a phylogeny. When the phylogenetic placement of these shifts is unknown, we identify a 'large p -small n' problem where traditional model selection criteria fail and favour overly complex scenarios. Instead, we propose a modified criterion that is better adapted to change-point models. 4. The challenges we identify here are inherent to trait evolution models on phylogenetic trees when observations are limited to present-day taxa, and require the addition of fossil taxa to be alleviated. We conclude with recommendations for empiricists.
Significance This paper compares the probabilistic accuracy of short-term forecasts of reported deaths due to COVID-19 during the first year and a half of the pandemic in the United States. Results show high variation in accuracy between and within stand-alone models and more consistent accuracy from an ensemble model that combined forecasts from all eligible models. This demonstrates that an ensemble model provided a reliable and comparatively accurate means of forecasting deaths during the COVID-19 pandemic that exceeded the performance of all of the models that contributed to it. This work strengthens the evidence base for synthesizing multiple models to support public-health action.
Hierarchical autocorrelation in the error term of linear models arises when sampling units are related to each other according to a tree. The residual covariance is parametrized using the tree-distance between sampling units. When observations are modeled using an Ornstein-Uhlenbeck (OU) process along the tree, the autocorrelation between two tips decreases exponentially with their tree distance. These models are most often applied in evolutionary biology, when tips represent biological species and the OU process parameters represent the strength and direction of natural selection. For these models, we show that the mean is not microergodic: no estimator can ever be consistent for this parameter and provide a lower bound for the variance of its MLE. For covariance parameters, we give a general sufficient condition ensuring microergodicity. This condition suggests that some parameters may not be estimated at the same rate as others. We show that, indeed, maximum likelihood estimators of the autocorrelation parameter converge at a slower rate than that of generally microergodic parameters. We showed this theoretically in a symmetric tree asymptotic framework and through simulations on a large real tree comprising 4507 mammal species.
Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits as a Brownian diffusion process acting on a phylogenetic tree. However, standard Brownian diffusion is quite restrictive and may not accurately characterize certain trait evolutionary processes. Here, we relax one of the major restrictions of standard Brownian diffusion by incorporating a nontrivial estimable mean into the process. We introduce a relaxed directional random walk (RDRW) model for the evolution of multivariate continuously varying traits along a phylogenetic tree. Notably, the RDRW model accommodates branch-specific variation of directional trends while preserving model identifiability. Furthermore, our development of a computationally efficient dynamic programming approach to compute the data likelihood enables scaling of our method to large data sets frequently encountered in phylogenetic comparative studies and viral evolution. We implement the RDRW model in a Bayesian inference framework to simultaneously reconstruct the evolutionary histories of molecular sequence data and associated multivariate continuous trait data, and provide tools to visualize evolutionary reconstructions. We demonstrate the performance of our model on synthetic data, and we illustrate its utility in two viral examples. First, we examine the spatiotemporal spread of HIV-1 in central Africa and show that the RDRW model uncovers a clearer, more detailed picture of the dynamics of viral dispersal than standard Brownian diffusion. Second, we study antigenic evolution in the context of HIV-1 resistance to three broadly neutralizing antibodies. Our analysis reveals evidence of a continuous drift at the HIV-1 population level towards enhanced resistance to neutralization by the VRC01 monoclonal antibody over the course of the epidemic. [Brownian Motion; Diffusion Processes; Phylodynamics; Phylogenetics; Phylogeography; Trait Evolution.].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.