Genome-scale DNA sequence data and the evolutionary history of placental mammals

Wu, Shaoyuan; Edwards, Scott V.; Lang, Liang

doi:10.1016/j.dib.2018.04.094

Cited by 20 publications

(15 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In many cases, it was difficult to determine whether these alignments had been rigorously curated, and even more challenging to find datasets for which the root position of a number of subclades could be assumed with confidence. The only dataset that met all of our criteria was a dataset of placental mammals with 78 ingroup taxa and 3,050,199 amino acids (Wu, et al 2019). This dataset was originally published as an MSA (Liu, et al 2017) based on very high-quality sequences from Ensembl, NCBI, and GenBank databases.…”

Section: Empirical Datasetsmentioning

confidence: 99%

Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Non-Reversible Models for Mammals

Naser-Khdour

Minh

Lanfear

2020

Preprint

View full text Add to dashboard Cite

Using time-reversible Markov models is a very common practice in phylogenetic analysis, because although we expect many of their assumptions to be violated by empirical data, they provide high computational efficiency. However, these models lack the ability to infer the root placement of the estimated phylogeny. In order to compensate for the inability of these models to root the tree, many researchers use external information such as using outgroup taxa or additional assumptions such as molecular-clocks. In this study, we investigate the utility of non-reversible models to root empirical phylogenies and introduce a new bootstrap measure, the rootstrap, which provides information on the statistical support for any given root position.Availability and implementationA python script for calculating rootstrap support values is available at https://github.com/suhanaser/Rootstrap.

show abstract

Section: Empirical Datasetsmentioning

confidence: 99%

Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Non-Reversible Models for Mammals

Naser-Khdour

Minh

Lanfear

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…These exchanges and issues prompted us to explore in a general way the effects of alignment uncertainty and priors on a large phylogenomic data set in mammals, a useful test group with a history of coalescent analyses on large and diverse data sets [42–44]. A larger and improved set of alignments of [34] based on careful codon-based alignment and a state-of-the-art trimming pipeline is now available [45]. The current study comprehensively analyzes this data set of 5162 loci (total alignment length 9,150,597-14,623,557 bp) and 90 species [45] to evaluate the effects of alignment uncertainty, substitution model, and fossil priors on gene tree, species tree, and divergence time estimation in mammals.…”

Section: Introductionmentioning

confidence: 99%

“…A larger and improved set of alignments of [34] based on careful codon-based alignment and a state-of-the-art trimming pipeline is now available [45]. The current study comprehensively analyzes this data set of 5162 loci (total alignment length 9,150,597-14,623,557 bp) and 90 species [45] to evaluate the effects of alignment uncertainty, substitution model, and fossil priors on gene tree, species tree, and divergence time estimation in mammals.…”

Section: Introductionmentioning

confidence: 99%

The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life

Edwards

et al. 2019

BMC Evol Biol

Self Cite

View full text Add to dashboard Cite

Background: The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. Results: The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments-before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. Conclusions: Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.

show abstract

“…After data reduction, the data sets contained alignments of 36 (A itken et al . 2017) to 4709 (W u et al . 2018) loci, each with 10 species (Table 1).…”

Section: Methodsmentioning

confidence: 99%

The Multispecies Coalescent Model Outperforms Concatenation across Diverse Phylogenomic Data Sets

Jiang¹,

Edwards

Liu³

2019

Preprint

Self Cite

View full text Add to dashboard Cite

A statistical framework of model comparison and model validation is essential to resolving the debates over concatenation and coalescent models in phylogenomic data analysis. A set of statistical tests are here applied and developed to evaluate and compare the adequacy of substitution, concatenation, and multispecies coalescent (MSC) models across 47 phylogenomic data sets collected across tree of life. Tests for substitution models and the concatenation assumption of topologically concordant gene trees suggest that a poor fit of substitution models (44% of loci rejecting the substitution model) and concatenation models (38% of loci rejecting the hypothesis of topologically congruent gene trees) is widespread. Logistic regression shows that the proportions of GC content and informative sites are both negatively correlated with the fit of substitution models across loci. Moreover, a substantial violation of the concatenation assumption of congruent gene trees is consistently observed across 6 major groups (birds, mammals, fish, insects, reptiles, and others, including other invertebrates). In contrast, among those loci adequately described by a given substitution model, the proportion of loci rejecting the MSC model is 11%, significantly lower than those rejecting the substitution and concatenation models, and Bayesian model comparison strongly favors the MSC over concatenation across all data sets. Species tree inference suggests that loci rejecting the MSC have little effect on species tree estimation. Due to computational constraints, the Bayesian model validation and comparison analyses were conducted on the reduced data sets. A complete analysis of phylogenomic data requires the development of efficient algorithms for phylogenetic inference. Nevertheless, the concatenation assumption of congruent gene trees rarely holds for phylogenomic data with more than 10 loci. Thus, for large phylogenomic data sets, model comparison analyses are expected to consistently and more strongly favor the coalescent model over the concatenation model. Our analysis reveals the value of model validation and comparison in phylogenomic data analysis, as well as the need for further improvements of multilocus models and computational tools for phylogenetic inference.

show abstract

Genome-scale DNA sequence data and the evolutionary history of placental mammals

Cited by 20 publications

References 8 publications

Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Non-Reversible Models for Mammals

Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Non-Reversible Models for Mammals

The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life

The Multispecies Coalescent Model Outperforms Concatenation across Diverse Phylogenomic Data Sets

Contact Info

Product

Resources

About