Phylogenomics-the estimation of species trees from multilocus datasets-is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this paper, we address the challenge of estimating the species tree under GDL. We show that species trees are identifiable under a standard stochastic model for GDL, and that the polynomial-time algorithm ASTRAL-multi, a recent development in the ASTRAL suite of methods, is statistically consistent under this GDL model. We also provide a simulation study evaluating ASTRAL-multi for species tree estimation under GDL.
Phylogenomics-the estimation of species trees from multilocus data sets-is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this article, we address the challenge of estimating the species tree under GDL. We show that species trees are identifiable under a standard stochastic model for GDL, and that the polynomial-time algorithm ASTRAL-multi, a recent development in the ASTRAL suite of methods, is statistically consistent under this GDL model. We also provide a simulation study evaluating ASTRAL-multi for species tree estimation under GDL.
In a striking result, Louca and Pennell (2020) recently proved that a large class of birth-death models are statistically unidentifiable from lineage-through-time (LTT) data. Specifically, they showed that any pair of sufficiently smooth birth and death rate functions is "congruent" to an infinite collection of other rate functions, all of which have the same likelihood for any LTT vector of any dimension. This fact has distressing implications for the thousands of studies which have utilized birth-death models to study evolution.
In this paper, we qualify their finding by proving that an alternative and widely used class of birth-death models is indeed identifiable. Specifically, we show that piecewise constant birth-death models can, in principle, be consistently estimated and distinguished from one another, given a sufficiently large extant time tree and some knowledge of the present day population. Subject to mild regularity conditions, we further show that any unidentifiable birth-death model class can be arbitrarily closely approximated by a class of identifiable models. The sampling requirements needed for our results to hold are explicit, and are expected to be satisfied in many contexts such as the phylodynamic analysis of a global pandemic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.