We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem (HSA), and present a new algorithm which applies to several modes, combining global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that for input trees T and S, our algorithm has an O(nTnS + min(dT,dS)LTLS) time complexity, where nT,LT and dT are the number of nodes, the number of leaves, and the maximum node degree in T, respectively (satisfying dT ≤ LT ≤ nT), and similarly for nS,LS and dS with respect to the tree S. This improves the time complexity of previous algorithms for less general variants of the problem.In order to obtain this time bound for HSA, we developed new algorithms for a generalized variant of the Min-Cost Bipartite Matching problem (MCM), as well as to two derivatives of this problem, entitled All-Cavity-MCM and All-Pairs-Cavity-MCM. For two input sets of size n and m, where n ≤ m, MCM and both its cavity derivatives are solved in O(n3 + nm) time, without the usage of priority queues (e.g. Fibonacci heaps) or other complex data structures. This gives the first cubic time algorithm for All-Pairs-Cavity-MCM, and improves the running times of MCM and All-Cavity-MCM problems in the unbalanced case where n ≪ m.We implemented the algorithm (in all modes mentioned above) as a graphical software tool which computes and displays similarities between secondary structures of RNA given as input, and employed it to a preliminary experiment in which we ran all-against-all inter-family pairwise alignments of RNAse P and Hammerhead RNA family members, exposing new similarities which could not be detected by the traditional rooted ordered alignment approaches. The results demonstrate that our approach can be used to expose structural similarity between some RNAs with higher sensitivity than the traditional rooted ordered alignment approaches. Source code and web-interface for our tool can be found in http://www.cs.bgu.ac.il/\~negevcb/FRUUT.
International audienceWe define a novel variation on the constrained sequence alignment problem in which the constraint is given in the form of a regular expression. Given two sequences, an alphabet $\Gamma$ describing pairwise sequence alignment operations, and a regular expression $R$ over $\Gamma$, the problem is to compute the highest scoring sequence alignment $A$ of the given sequences, such that $A \in \Gamma^* L(R) \Gamma^* $. Two algorithms are given for solving this problem. The first basic algorithm is general and solves the problem in $O(nmr \log^2 r)$ time and $O(\min\{n,m\}r)$ space, where $m$ and $n$ are the lengths of the two sequences and $r$ is the size of the NFA for $R$. The second algorithm is restricted to patterns $P$ that do not contain the Kleene-closure star, and exploits this constraint to reduce the NFA size factor $r$ in the time complexity to a smaller factor $|P|$. $|P|$ is compacted by supporting alignment patterns extended by \emph{meta-characters} including general insertion, deletion and match operations, as well as some cases of substitutions. For a regular expression $P=P_1\cup\ldots\cup P_k$, these time bounds range from $O(knm)$ to $O(knm\log(\max\{|P_i|\}))$, depending on the meta-characters used in $P$. An additional result obtained along the way is an extension of the algorithm of Fischer and Paterson for String Matching with Wildcards. Our extension allows the input strings to include "negation symbols" (that match all letters but a specific one) while retaining the original time complexity. We implemented both algorithms and applied them to data-mine new miRNA seeding patterns in \textit{C. elegans} Clip-seq experimental data
The discovery and functional analysis of noncoding RNA (ncRNA) systems in different organisms motivates the development of tools for aiding ncRNA research. Several tools exist that search for occurrences of a given RNA structural profile in genomic sequences. Yet, there is a need for an "RNA BLAST" tool, i.e., a tool that takes a putative functional RNA sequence as input, and efficiently searches for similar sequences in genomic databases, taking into consideration potential secondary structure features of the input query sequence. This work aims at providing such a tool. Our tool, denoted StemSearch, is based on a structural representation of an RNA sequence by its potential stems. Potential stems in genomic sequences are identified in a preprocessing stage, and indexed. A user-provided query sequence is likewise processed, and stems from the target genomes that are similar to the query stems are retrieved from the index. Then, relevant genomic regions are identified and ranked according to their similarity to the query stem-set while enforcing conservation of cross-stem topology. Experiments using RFAM families show significantly improved recall for StemSearch over BLAST, with small loss of precision. We further demonstrate our system's capability to handle eukaryotic genomes by successfully searching for members of the 7SK family in chromosome 2 of the human genome. StemSearch is freely available on the web at: http://www.cs.bgu.ac.il/∼negevcb/StemSearch.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.