Finding optimal evolutionary trees from sequence data is typically an intractable problem, and there is usually no way of knowing how close to optimal the best tree from some search truly is. The problem would seem to be particularly acute when we have many taxa and when that data has high levels of homoplasy, in which the individual characters require many changes to fit on the best tree. However, a recent mathematical result has provided a precise tool to generate a short number of high-homoplasy characters for any given tree, so that this tree is provably the optimal tree under the maximum parsimony criterion. This provides, for the first time, a rigorous way to test tree search algorithms on homoplasy-rich data, where we know in advance what the 'best' tree is. In this short note we consider just one search program (TNT) but show that it is able to locate the globally optimal tree correctly for 32,768 taxa, even though the characters in the dataset requires, on average, 1148 state-changes each to fit on this tree, and the number of characters is only 57.Keywords: Phylogenetic tree, maximum parsimony, homoplasy, tree search Phylogenetic tree reconstruction methods based on optimization criteria (such as maximum parsimony or maximum likelihood) have long been known to be computationally intractable (NP-hard) (Foulds and Graham, 1982). However, on perfectly tree-like data (i.e. long sequences with low homoplasy), these methods will generally find the optimal tree quickly, even for large datasets. Moreover, when data is largely tree-like, there are good theoretical and computational methods for finding an optimal tree under methods such as maximum parsimony, with an early result more than 30 years ago (Hendy et al., 1980), along with more recent developments (Blelloch et al., 2006;Holland et al., 2005).So far, it has not been clear whether such methods would be able to find the global 'optimal' tree for homoplasy-rich datasets with large numbers of taxa, particularly when the sequences are short. The traditional view (Sokal and Sneath, 1963) is that homoplasy tends to obscure tree signal, requiring more character data than homoplasy-free data to recover a tree, though contrary opinions that homoplasy can 'help' have also appeared (Kälersjö et al., 1999).A fundamental obstacle arises in trying to answer this question: One usually cannot guarantee in advance that any tree will be optimal for homoplasy-rich data without first searching exhaustively through tree space, and this precludes datasets involving hundreds (let alone thousands) of taxa. However, a recent mathematical result by Chai and Hous-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.