Proceedings of the Fourth Annual International Conference on Computational Molecular Biology 2000
DOI: 10.1145/332306.332553
|View full text |Cite
|
Sign up to set email alerts
|

Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification

Abstract: This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs are composed ofp > 2 parts separated by constrained spacers These algorithms use a suffix tree for fulfilling this task. They are efficient enough to be able to extract site consensus, such as promoter sequences, from a whole collection of non coding sequences extracted from a genome. In particular, their time complexity scales linearly with N2n where n is the average length of t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
43
0

Year Published

2001
2001
2010
2010

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 66 publications
(43 citation statements)
references
References 19 publications
0
43
0
Order By: Relevance
“…A resemblance exists between this structure and the work related to regulatory motifs [71,66,83,60] and probabilistic suffix trees [78,82,69]. Regulatory motifs characterize short sequences of DNA and determine the timing location and level of gene expression, and the approaches extracting regulatory motifs can be divided into two categories: those that exploit word-counting heuristics [57,69] and those based on the use of probabilistic models [40,48,64,79,85,87]; in the second category of approaches, the motifs are represented by position probabilistic matrices, whereas the remainder of the sequences are represented by background models. The probabilistic or prediction suffix tree is basically a stochastic model that employs a suffix tree as its index structure to represent compactly the conditional probabilities distribution for a cluster of sequences.…”
Section: Index Structures For Weighted Stringsmentioning
confidence: 84%
See 1 more Smart Citation
“…A resemblance exists between this structure and the work related to regulatory motifs [71,66,83,60] and probabilistic suffix trees [78,82,69]. Regulatory motifs characterize short sequences of DNA and determine the timing location and level of gene expression, and the approaches extracting regulatory motifs can be divided into two categories: those that exploit word-counting heuristics [57,69] and those based on the use of probabilistic models [40,48,64,79,85,87]; in the second category of approaches, the motifs are represented by position probabilistic matrices, whereas the remainder of the sequences are represented by background models. The probabilistic or prediction suffix tree is basically a stochastic model that employs a suffix tree as its index structure to represent compactly the conditional probabilities distribution for a cluster of sequences.…”
Section: Index Structures For Weighted Stringsmentioning
confidence: 84%
“…The probabilistic or prediction suffix tree is basically a stochastic model that employs a suffix tree as its index structure to represent compactly the conditional probabilities distribution for a cluster of sequences. Each node of a probabilistic suffix tree is associated with a probability vector that stores the probability distribution for the next symbol given the label of the node as the preceding segment, and algorithms that use probabilistic suffix trees to process regulatory motifs can be found in [82,69]. However, the probabilistic suffix tree is inefficient for efficiently handling weighted sequences, which is why the weighted suffix tree was introduced; however, it could be possible for a suitable combination of the two structures to be effective to handle both problem categories.…”
Section: Index Structures For Weighted Stringsmentioning
confidence: 99%
“…Recently, several methods have been suggested to identify occurrences of known CRMs (Berman et al, 2002;Frith et al, 2001) and to find novel CRMs given a database of known motifs (Sharan et al, 2003;Kel-Margoulis et al, 2002;Aerts et al, 2003), but these methods are restricted to TFs whose binding sites have been previously characterized. To date, we are aware of only one approach that tries to identify novel CRMs and at the same time learn their component motifs de novo (Marsan and Sagot, 2000). A shortcoming of the latter approach is that it is based on a consensus sequence representation of a motif, which has less expressive power compared to the more widely used position weight matrix model.…”
Section: Identifying Spatial Cis-regulatory Modules 823mentioning
confidence: 99%
“…Some recent methods attempt to incorporate siteclustering information with de novo motif discovery by building a rule to discriminate modules preserving a certain ordering of motifs from sequences with random occurrences of motifs (20,21). However, these methods do not explicitly specify a probability model and impose restrictive conditions such as a known number of motifs in the module or a known number of occurrences of each motif in the module.…”
mentioning
confidence: 99%