Gene tree correction with respect to a given species tree is a problem that has been recently proposed in order to better understand the evolution of gene families. One of the combinatorial methods proposed to tackle with this problem aims to correct a gene tree by removing the minimum number of leaves/labels (Minimum Leaf Removal and Minimum Label Removal, respectively). The two problems have been shown to be APX-hard, and fixed-parameter tractable, when parameterized by the number of leaves/labels removed. In this paper, we focus on the approximation complexity of these two problems and we show that they are not approximable within factor b log m, where m is the number of leaves of the species tree and b > 0 is a constant. Furthermore, we introduce and study two new variants of the problem, where the goal is the correction of a gene tree with the minimum number of leaf/label modifications (Minimum Leaf Modification and Minimum Label Modification, respectively). We show that the two modification problems, differently from the removal versions, are unlikely to be fixed-parameter tractable. More precisely, we prove that the Minimum Leaf Modification problem is W [1]-hard, when parameterized by the number of leaf modifications, and that the Minimum Label Modification problem is W [2]-hard, when parameterized by the number of label modifications.Keywords: Computational Biology, Gene Tree Reconciliation, Gene Tree Correction, Approximation Complexity, Parameterized Complexity
IntroductionMacro-evolutionary events, like duplications and losses, are crucial evolutionary events for genome evolutions [2,3]. In particular, due to duplications, many gene copies can be found inside a genome. A gene family consists of those gene copies originating from duplications of a single gene.Given a gene family, a first step to understand its evolutionary history is to construct a phylogeny, called gene tree, that represents the evolution associated with different gene families in a given set of species. Usually, gene trees are built based on the similarity of the associated sequences. Then, the gene tree is compared to a species tree, which is a phylogeny that represents the speciation history of the genomes of the considered species, hence it is based on a model that considers only speciations as evolutionary events. The comparison of a gene tree and a species tree is known as reconciliation [4,5,6,7,8,9,10,11,12,13,14,15], and has the goal of inferring the macro-evolutionary events (duplications, losses, and in some cases lateral gene transfers) that occurred during evolution. When no species tree is known, then the definition of the problem changes: starting from a set of possibly discordant gene trees, it asks to infer a correct species tree, usually based on a parsimonious evolutionary scenario [16,6,17].It has been observed that reconciliation is highly sensitive to errors in the gene trees. Indeed, few errors can produce a completely misleading evolutionary scenario, which usually leads to a greater number of duplicat...