Branch-and-Bound Approach for Parsimonious Inference of a Species Tree from a Set of Gene Family Trees

Doyon, Jean-Philippe; Chauve, Cédric

doi:10.1007/978-1-4419-7046-6_29

Cited by 10 publications

(8 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More recently, a branch-and-bound algorithm to identify exact solutions for the GD problem was introduced [14]. This algorithm was applied to a data-set consisting of 1, 111 gene trees with 29-taxa, but it did not run to completion.…”

Section: Introductionmentioning

confidence: 99%

An ILP solution for the gene duplication problem

et al. 2011

View full text Add to dashboard Cite

BackgroundThe gene duplication (GD) problem seeks a species tree that implies the fewest gene duplication events across a given collection of gene trees. Solving this problem makes it possible to use large gene families with complex histories of duplication and loss to infer phylogenetic trees. However, the GD problem is NP-hard, and therefore, most analyses use heuristics that lack any performance guarantee.ResultsWe describe the first integer linear programming (ILP) formulation to solve instances of the gene duplication problem exactly. With simulations, we demonstrate that the ILP solution can solve problem instances with up to 14 taxa. Furthermore, we apply the new ILP solution to solve the gene duplication problem for the seed plant phylogeny using a 12-taxon, 6, 084-gene data set. The unique, optimal solution, which places Gnetales sister to the conifers, represents a new, large-scale genomic perspective on one of the most puzzling questions in plant systematics.ConclusionsAlthough the GD problem is NP-hard, our novel ILP solution for it can solve instances with data sets consisting of as many as 14 taxa and 1, 000 genes in a few hours. These are the largest instances that have been solved to optimally to date. Thus, this work can provide large-scale genomic perspectives on phylogenetic questions that previously could only be addressed by heuristic estimates.

show abstract

Section: Introductionmentioning

confidence: 99%

An ILP solution for the gene duplication problem

et al. 2011

View full text Add to dashboard Cite

show abstract

“…Here we address the problem of finding a species tree that has a minimum total number of duplications and losses, treating incompleteness as due to true biological loss. Prior results on GTP include a branch-and-bound algorithm in [ 23 ] based on techniques from [ 18 ], a randomized hill-climbing heuristic presented in [ 4 ], a probabilistic and computationally expensive method for co-estimating gene and species trees [ 1 ], and dynamic programming based solutions by Hallett and Lagergren [ 15 ], Bayzid et al [ 20 ] and Chang et al [ 24 ]. However, none of these studies takes the reasons of incompleteness into account, and we have already shown that the standard calculation for losses can be incorrect when incompleteness is due to true biological loss.…”

Section: Algorithms To Find Species Treesmentioning

confidence: 99%

Gene tree parsimony for incomplete gene trees: addressing true biological loss

Bayzid

Warnow

2018

Algorithms Mol Biol

View full text Add to dashboard Cite

Motivation Species tree estimation from gene trees can be complicated by gene duplication and loss, and “gene tree parsimony” (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species).ResultsWe present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the “standard” calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.

show abstract

“…Here we introduce iGTP, a stand-alone software application with an easy-to-use graphical user interface (Figure 1 ) that makes it possible to conduct large-scale gene tree parsimony analyses on hundreds of taxa and thousands of gene phylogenies for three of the most important variants of the GTP problem: (i) the duplication problem [ 7 , 25 - 32 ], which minimizes the number of gene duplications, (ii) the duplication-loss problem [ 7 , 25 - 34 ], which minimizes the number of gene duplications and losses, and (iii) the deep-coalescence problem [ 17 , 35 , 36 ], which minimizes the number of deep coalescences. All of these variants of GTP are intrinsically hard [ 37 , 38 ], and exact algorithms [ 15 , 17 , 39 , 40 ] are feasible only when there are very few taxa. Therefore, iGTP relies on widely-used local search heuristics that have been proven to be effective in previous studies [ 36 , 41 , 42 ].…”

Section: Introductionmentioning

confidence: 99%

iGTP: A software package for large-scale gene tree parsimony analysis

et al. 2010

View full text Add to dashboard Cite

BackgroundThe ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle.ResultsWe introduce iGTP, a platform-independent software program that implements state-of-the-art algorithms that greatly speed up species tree inference under the duplication, duplication-loss, and deep coalescence reconciliation costs. iGTP significantly extends and improves the functionality and performance of existing gene tree parsimony software and offers advanced features such as building effective initial trees using stepwise leaf addition and the ability to have unrooted gene trees in the input. Moreover, iGTP provides a user-friendly graphical interface with integrated tree visualization software to facilitate analysis of the results.ConclusionsiGTP enables, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication, duplication-loss, and deep coalescence reconciliation costs, all from within a convenient graphical user interface.

show abstract

Branch-and-Bound Approach for Parsimonious Inference of a Species Tree from a Set of Gene Family Trees

Cited by 10 publications

References 22 publications

An ILP solution for the gene duplication problem

An ILP solution for the gene duplication problem

Gene tree parsimony for incomplete gene trees: addressing true biological loss

iGTP: A software package for large-scale gene tree parsimony analysis

Contact Info

Product

Resources

About