The tandem duplication random loss operation (TDRL) is an important genome rearrangement operation in metazoan mitochondrial genomes. A TDRL consists of a duplication of a contiguous set of genes in tandem followed by a random loss of one copy of each duplicated gene. This paper presents an analysis of the combinatorics of TDRLs on circular genomes, e.g., the mitochondrial genome. In particular, results on TDRLs for circular genomes and their linear representatives are established. Moreover, the distance between gene orders with respect to linear TDRLs and circular TDRLs is studied. An analysis of the available animal mitochondrial gene orders shows the practical relevance of the theoretical results.
A tandem duplication random loss (TDRL) operation duplicates a contiguous segment of genes, followed by the random loss of one copy of each of the duplicated genes. Although the importance of this operation is founded by several recent biological studies, it has been investigated only rarely from a theoretical point of view. Of particular interest are sorting TDRLs which are TDRLs that, when applied to a permutation representing a genome, reduce the distance towards another given permutation. The identification of sorting genome rearrangement operations in general is a key ingredient of many algorithms for reconstructing the evolutionary history of a set of species. In this paper we present methods to compute all sorting TDRLs for two given gene orders. In addition, a closed formula for the number of sorting TDRLs is derived and further properties of sorting TDRLs are investigated. It is also shown that the theoretical findings are useful for identifying unique sorting TDRL scenarios for mitochondrial gene orders.
Quasi-biclique mining for bipartite graphs has found important applications in providing security services. However, the standard MapReduce algorithm for mining quasibicliques does not scale well due to the need of shuffling and reducing a huge number of map outputs. To cope with web-scale graphs, we propose a scalable algorithm with the use of Giraph, which is a new rising large-scale graph processing platform following the bulk synchronous parallel (BSP) model. Experimental results on real world domain-IP graphs demonstrate that our proposed solution is able to reduce CPU time by 80% and disk I/O by 95%, compared with the standard MapReduce algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.