Abstract-We describe a theoretical unifying framework to express comparison of RNA structures, which we call alignment hierarchy. This framework relies on the definition of common supersequences for arc-annotated sequences, and encompasses main existing models for RNA structure comparison based on trees and arc-annotated sequences with a variety of edit operations. It also gives rise to edit models that have not been studied yet. We provide a thorough analysis of the alignment hierarchy, including a new polynomial time algorithm and an NP-completeness proof. The polynomial time algorithm involves biologically relevant evolutionary operations, such as pairing or unpairing nucleotides. It has been implemented in a software, called gardenia that is available at the web server http://bioinfo.lifl.fr/RNA/gardenia.
Abstract.Recently, several studies taking into account the ability for a gene to be absent or to have some copies in genomes have been proposed, as the examplar distance [11,6] or the gene matching computation between two genomes [10,3]. In this paper, we study the time complexity of the conserved interval distance computation considering duplicated genes using both those two strategies.
A preliminary step to most comparative genomics studies is the annotation of chromosomes as ordered sequences of genes. Different genetic mapping techniques often give rise to different maps with unequal gene content and sets of unordered neighboring genes. Only partial orders can thus be obtained from combining such maps. However, once a total order O is known for a given genome, it can be used as a reference to order genes of a closely related species characterized by a partial order P . Our goal is to find a linearization of P that is as close as possible to O, in term of a given genomic distance. We first prove NP-completeness complexity results considering the breakpoint and the common interval distances. We then focus on the breakpoint distance and give a dynamic programming algorithm whose running time is exponential for general partial orders, but polynomial when the partial order is derived from a bounded number of genetic maps. A time-efficient greedy heuristic is then given for the general case and is empirically shown to produce solutions within 10% of the optimal solution, on simulated data. Applications to the analysis of grass genomes are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.