A comprehensive survey of a rapidly expanding field of combinatorial optimization, mathematically oriented but offering biological explanations when required. From one cell to another, from one individual to another, and from one species to another, the content of DNA molecules is often similar. The organization of these molecules, however, differs dramatically, and the mutations that affect this organization are known as genome rearrangements. Combinatorial methods are used to reconstruct putative rearrangement scenarios in order to explain the evolutionary history of a set of species, often formalizing the evolutionary events that can explain the multiple combinations of observed genomes as combinatorial optimization problems. This book offers the first comprehensive survey of this rapidly expanding application of combinatorial optimization. It can be used as a reference for experienced researchers or as an introductory text for a broader audience. Genome rearrangement problems have proved so interesting from a combinatorial point of view that the field now belongs as much to mathematics as to biology. This book takes a mathematically oriented approach, but provides biological background when necessary. It presents a series of models, beginning with the simplest (which is progressively extended by dropping restrictions), each constructing a genome rearrangement problem. The book also discusses an important generalization of the basic problem known as the median problem, surveys attempts to reconstruct the relationships between genomes with phylogenetic trees, and offers a collection of summaries and appendixes with useful additional information.
In comparative genomics, a transposition is an operation that exchanges two consecutive sequences of genes in a genome. The transposition distance, that is, the minimum number of transpositions needed to transform a genome into another, is, according to numerous studies, a relevant evolutionary distance. The problem of computing this distance when genomes are represented by permutations, called the Sorting by Transpositions problem, has been introduced by Bafna and Pevzner [3] in 1995. It has naturally been the focus of a number of studies, but the computational complexity of this problem has remained undetermined for 15 years. In this paper, we answer this long-standing open question by proving that the Sorting by Transpositions problem is NP-hard. As a corollary of our result, we also prove that the following problem [8] is NP-hard: given a permutation π, is it possible to sort π using d b (π)/3 permutations, where d b (π) is the number of breakpoints of π?
Angibaud et al. Approximability of Comparing Genomes with DuplicatesAbstract A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogenetic tree. A large number of such measures has been proposed in the recent past: number of reversals, number of breakpoints, number of common or conserved intervals etc. In their initial definitions, all these measures suppose that genomes contain no duplicates. However, we now know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. Then, after a gene relabeling according to this matching and a deletion of the unmatched signed genes, two genomes without duplicates are obtained and the measure can be computed.In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar, intermediate and maximum matching models). We prove that, for each model and each measure M, computing a matching between two genomes that optimizes M is APX-hard. We show that this result remains true even for two genomes G1 and G2 such that G1 contains no duplicates and no gene of G2 appears more than twice. Therefore, our results extend those of [7,10,13]. Besides, in order to evaluate the possible existence of approximation algorithms concerning the number of breakpoints, we also study the complexity of the following decision problem: is there an exemplarization (resp. an intermediate matching, a maximum matching) that induces no breakpoint ? In particular, we extend a result of [13] by proving the problem to be NP-complete in the exemplar model for a new class of instances, we note that the problems are equivalent in the intermediate and the exemplar models and we show that the problem is in P in the maximum matching model. Finally, we focus on a fourth measure, closely related to the number of breakpoints: the number of adjacencies, for which we give several constant ratio approximation algorithms in the maximum matching model, in the case where genomes contain the same number of duplications of each gene.
Comparing genomes of different species is a fundamental problem in comparative genomics. Recent research has resulted in the introduction of different measures between pairs of genomes: reversal distance, number of breakpoints, number of common or conserved intervals, etc. However, classical methods used for computing such mea- 1 scattered across them. Most approaches to overcome this difficulty are based either on the exemplar model, which keeps exactly one copy in each genome of each duplicated gene, or on the maximum matching model, which keeps as many copies as possible of each duplicated gene. The goal is to find an exemplar matching, respectively a maximum matching, that optimizes the studied measure. Unfortunately, it turns out that, in presence of duplications, this problem for each above-mentioned measure is NP-hard.In this paper, we propose to compute the minimum number of breakpoints and the maximum number of adjacencies between two genomes in presence of duplications using two different approaches. The first one is a (exact) generic 0-1 linear programming approach, while the second is a collection of three heuristics. Each of these approaches is applied on each problem and for each of the following models: exemplar, maximum matching and intermediate model, that we introduce here. All these programs are run on a well-known public benchmark dataset of γ-Proteobacteria, and their performances are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.