In this paper, we design a heuristic algorithm of computing a constrained multiple sequence alignment (CMSA for short) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together. If the number of residues needed to be aligned together is a constant alpha, then the time-complexity of our CMSA algorithm for aligning K sequences is O(alphaKn(4)), where n is the maximum of the lengths of sequences. In addition, we have built up such a CMSA software system and made several experiments on the RNase sequences, which mainly function in catalyzing the degradation of RNA molecules. The resulting alignments illustrate the practicability of our method.
Expressed Sequence Tags (EST) are widely used for the discovery of new genes, particularly those involved in human disease processes. A subsequence in an EST dataset is unique if it appears only in one EST sequence of the dataset but does not appear in any other EST sequence. The unique subsequences can be regarded as signatures that distinguish an EST from all the others, and provide valuable information for many applications, such as PCR primer designs and microarray experiments. The discoveries of unique signatures on large-scale EST datasets are previously computational challenges. In this paper, we propose two efficient algorithms to extract the unique signatures from EST databases. The algorithms perform impressive discovery efficiencies in the experiments on real human ESTs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.