Abstract:Genome rearrangement problems have been extensively studied due to their importance in biology. Most studied models assumed a single copy per gene. However, in reality, duplicated genes are common, most notably in cancer. In this study, we make a step toward handling duplicated genes by considering a model that allows the atomic operations of cut, join, and whole chromosome duplication. Given two linear genomes, [Formula: see text] with one copy per gene and [Formula: see text] with two copies per gene, we giv… Show more
“…The most natural one asks if our model can be extended to include other kinds of duplications, other than single-gene duplications. It was shown in [20] that Whole-Chromosome Duplications can be handled, although it is much more complicated to compute the distance. It is then relevant to ask if an intermediate model accounting for a wider range of duplication mechanisms can lead to tractable distance problems.…”
Section: Discussionmentioning
confidence: 99%
“…For example, whereas the distance between two genomes can be computed in linear time for genomes without duplicated genes under the Double-Cut and Join (DCJ) model, it becomes NP-complete to compute the distance when duplicated genes are considered [15,16], although it can be approximated when the gene content in both genomes is balanced [17]. So far, even in simpler genome rearrangement models, the general problem of computing a distance with duplicated genes is difficult [18,19], with the exception of polynomial time algorithms for two extensions of the SCJ model that include large-scale duplications: the SCJ double distance [12], where duplicated genes occur through a WGD, and the SCJ and whole chromosome duplication (WCD) problem, motivated by cancer genomics [20].…”
Section: Algorithms For Molecular Biologymentioning
Background.
In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model.
Results.
We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data.
Conclusion.
Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances.
“…The most natural one asks if our model can be extended to include other kinds of duplications, other than single-gene duplications. It was shown in [20] that Whole-Chromosome Duplications can be handled, although it is much more complicated to compute the distance. It is then relevant to ask if an intermediate model accounting for a wider range of duplication mechanisms can lead to tractable distance problems.…”
Section: Discussionmentioning
confidence: 99%
“…For example, whereas the distance between two genomes can be computed in linear time for genomes without duplicated genes under the Double-Cut and Join (DCJ) model, it becomes NP-complete to compute the distance when duplicated genes are considered [15,16], although it can be approximated when the gene content in both genomes is balanced [17]. So far, even in simpler genome rearrangement models, the general problem of computing a distance with duplicated genes is difficult [18,19], with the exception of polynomial time algorithms for two extensions of the SCJ model that include large-scale duplications: the SCJ double distance [12], where duplicated genes occur through a WGD, and the SCJ and whole chromosome duplication (WCD) problem, motivated by cancer genomics [20].…”
Section: Algorithms For Molecular Biologymentioning
Background.
In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model.
Results.
We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data.
Conclusion.
Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances.
Permutation codes have recently garnered substantial research interest due to their potential in various applications, including cloud storage systems, genome resequencing, and flash memories. In this paper, we study the theoretical bounds and constructions of permutation codes in the generalized Cayley metric. The generalized Cayley metric captures the number of generalized transposition errors in a permutation, and subsumes previously studied error types, including transpositions and translocations, without imposing restrictions on the lengths and positions of the translocated segments. Based on the socalled breakpoint analysis method proposed by Chee and Vu, we first present a coding framework that leads to order-optimal constructions, thus improving upon the existing constructions that are not order-optimal. We then use this framework to also develop an order-optimal coding scheme that is additionally explicit and systematic.
“…For example, whereas the distance between two genomes can be computed in linear time for genomes without duplicated genes under the Double-Cut and Join (DCJ) model, it becomes NP-complete to compute the distance when duplicated genes are considered [15,16], although it can be approximated when the gene content in both genomes is balanced [17]. So far, even in simpler genome rearrangement models, the general problem of computing a distance with duplicated genes is difficult [18,19], with the exception of polynomial time algorithms for two extensions of the SCJ model that include large-scale duplications: the SCJ double distance [12], where duplicated genes occur through a WGD, and the SCJ and whole chromosome duplication (WCD) problem, motivated by cancer genomics [20].…”
Background: In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, it is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model. Result: We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data. Conclusion: Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances. Availability: https://github.com/cchauve/SCJ-with-SGD
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.