Abstract. Within the field of phylogenetics there is great interest in distance measures to quantify the dissimilarity of two trees. Here, based on an idea of Bruen and Bryant, we propose and analyze a new distance measure: the Maximum Parsimony (MP) distance. This is based on the difference of the parsimony scores of a single character on both trees under consideration, and the goal is to find the character which maximizes this difference. In this article we show that this new distance is a metric and provides a lower bound to the well-known Subtree Prune and Regraft (SPR) distance. We also show that to compute the MP distance it is sufficient to consider only characters that are convex on one of the trees, and prove several additional structural properties of the distance. On the complexity side, we prove that calculating the MP distance is in general NP-hard, and identify an interesting island of tractability in which the distance can be calculated in polynomial time.Mathematics Subject Classification (2010). 05C15; 05C35; 90C35; 92D15.
Abstract. Phylogenetic networks are used to display the relationship among different species whose evolution is not treelike, which is the case, for instance, in the presence of hybridization events or horizontal gene transfers. Tree inference methods such as Maximum Parsimony need to be modified in order to be applicable to networks. In this paper, we discuss two different definitions of Maximum Parsimony on networks, "hardwired" and "softwired", and examine the complexity of computing them given a network topology and a character. By exploiting a link with the problem Multicut, we show that computing the hardwired parsimony score for 2-state characters is polynomial-time solvable, while for characters with more states this problem becomes NP-hard but is still approximable and fixed parameter tractable in the parsimony score. On the other hand we show that, for the softwired definition, obtaining even weak approximation guarantees is already difficult for binary characters and restricted network topologies, and fixed-parameter tractable algorithms in the parsimony score are unlikely. On the positive side we show that computing the softwired parsimony score is fixed-parameter tractable in the level of the network, a natural parameter describing how tangled reticulate activity is in the network. Finally, we show that both the hardwired and softwired parsimony score can be computed efficiently using Integer Linear Programming. The software has been made freely available.
Planning for the protection of species often involves difficult choices about which species to prioritize, given constrained resources. One way of prioritizing species is to consider their “evolutionary distinctiveness”, i.e. their relative evolutionary isolation on a phylogenetic tree. Several evolutionary isolation metrics or phylogenetic diversity indices have been introduced in the literature, among them the so-called Fair Proportion index (also known as the “evolutionary distinctiveness” score). This index apportions the total diversity of a tree among all leaves, thereby providing a simple prioritization criterion for conservation. Here, we focus on the prioritization order obtained from the Fair Proportion index and analyze the effects of species extinction on this ranking. More precisely, we analyze the extent to which the ranking order may change when some species go extinct and the Fair Proportion index is re-computed for the remaining taxa. We show that for each phylogenetic tree, there are edge lengths such that the extinction of one leaf per cherry completely reverses the ranking. Moreover, we show that even if only the lowest ranked species goes extinct, the ranking order may drastically change. We end by analyzing the effects of these two extinction scenarios (extinction of the lowest ranked species and extinction of one leaf per cherry) for a collection of empirical and simulated trees. In both cases, we can observe significant changes in the prioritization orders, highlighting the empirical relevance of our theoretical findings.
Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.
In phylogenetics, distances are often used to measure the incongruence between a pair of phylogenetic trees that are reconstructed by different methods or using different regions of genome. Motivated by the maximum parsimony principle in tree inference, we recently introduced the maximum parsimony (MP) distance, which enjoys various attractive properties due to its connection with several other well-known tree distances, such as tbr and spr. Here we show that computing the MP distance between two trees, a NP-hard problem in general, is fixed parameter tractable in terms of the tbr distance between the tree pair. Our approach is based on two reduction rules – the chain reduction and the subtree reduction – that are widely used in computing tbr and spr distances. More precisely, we show that reducing chains to length 4 (but not shorter) preserves the MP distance. In addition, we describe a generalization of the subtree reduction which allows the pendant subtrees to be rooted in different places, and show that this still preserves the MP distance. On a slightly different note we also show that Monadic Second Order Logic (MSOL), posited over an auxiliary graph structure known as the display graph (obtained by merging the two trees at their leaves), can be used to obtain an alternative proof that computation of MP distance is fixed parameter tractable in terms of tbr-distance. We conclude with an extended discussion in which we focus on similarities and differences between MP distance and TBR distance and present a number of open problems. One particularly intriguing question, emerging from the MSOL formulation, is whether two trees with bounded MP distance induce display graphs of bounded treewidth
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.