“…It is a generalization of the well-known suffix tree of a set of sequences [65,16], which is one of the most important data structures in the field of pattern recognition. Such a suffix tree has O(n 1 + · · · + n k ) nodes and can be constructed in O(n 1 + · · · + n k ) time by several algorithms [65,30,58,12,16].…”
Section: Suffix-set Treesmentioning
confidence: 99%
“…{D, E, Q, N } C 12 {H, R, K} C 13 {R, K} C 14 {F, W, Y } C 15 {G, N } C 16 {A, C, G, S} C 17 {S, T } C 18 {D, E}…”
Section: A Simple Construction Algorithmmentioning
confidence: 99%
“…For recent reviews highlighting the importance of multiple alignments in molecular biology, we refer the reader to [16,14,26,11].…”
Section: Introductionmentioning
confidence: 99%
“…Feasible approaches to solve the problem are then all of a heuristic nature, as can be seen in [16,46,10,14,42,2,38,39,11].…”
We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches.
“…It is a generalization of the well-known suffix tree of a set of sequences [65,16], which is one of the most important data structures in the field of pattern recognition. Such a suffix tree has O(n 1 + · · · + n k ) nodes and can be constructed in O(n 1 + · · · + n k ) time by several algorithms [65,30,58,12,16].…”
Section: Suffix-set Treesmentioning
confidence: 99%
“…{D, E, Q, N } C 12 {H, R, K} C 13 {R, K} C 14 {F, W, Y } C 15 {G, N } C 16 {A, C, G, S} C 17 {S, T } C 18 {D, E}…”
Section: A Simple Construction Algorithmmentioning
confidence: 99%
“…For recent reviews highlighting the importance of multiple alignments in molecular biology, we refer the reader to [16,14,26,11].…”
Section: Introductionmentioning
confidence: 99%
“…Feasible approaches to solve the problem are then all of a heuristic nature, as can be seen in [16,46,10,14,42,2,38,39,11].…”
We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches.
“…String matching has broad applications; for instance in bibliographic search, lexical analysis, web search engines and recently as a filtering purpose for DNA sequence searching [6] [3]. Both single pattern and multi-patterns are stressed in string matching application domain.…”
Abstract-DNA sequence similarity search is an important task in computational biology applications. Similarity search procedure is executed by an alignment process between query and targeted sequences. An optimal alignment process based on the dynamic programming algorithms has shown to have O(n m) time and space complexity. Heuristics algorithms can process a fast DNA sequence alignment, but generate low comparison sensitivity. The biologists frequently demand for optimal comparison result so that the perfect structure of living beings evolution can be constructed. This task becomes more complex and challenging as the sizes of public sequence databases get very large and are increasing exponentially each year. The aim of this study is to develop a filtering algorithm in order to reduce the iteration of dynamic programming process and therefore an efficient process of retrieving a set of similar DNA sequences in database can be made.The algorithm filtered the expected irrelevant DNA sequences in database from being computed for dynamic programming based optimal alignment process. An automaton-based algorithm is used to develop the filtering process proposed. A set of random patterns is generated from query sequence are placed in automaton machine before exact matching and scoring process is performed. Extensive experiments have been carried out on several parameters and the results show that the developed filtering algorithm removed the unrelated targeted sequences from being aligned with query sequence Index Terms-Exact string matching, Aho-Corasick algorithm, sequence comparison, Smith-Waterman algorithm.
We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.