Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification

Marsan, Laurent; Sagot, Marie‐France

doi:10.1089/106652700750050826

Cited by 207 publications

(182 citation statements)

References 28 publications

(31 reference statements)

Supporting

Mentioning

172

Contrasting

Unclassified

Order By: Relevance

“…Algorithms for structured motifs extraction [6] address the extraction of consensus motifs that appear together in a well-ordered and regularly spaced manner. A structured motif can be described as an ordered collection of p ≥ 1 boxes, a maximum allowed error (substitutions) for each box, and an interval of distance for each pair of consecutive boxes.…”

Section: Preliminariesmentioning

confidence: 99%

“…No constraint, and therefore no statistical value is put on the distances separating them. This paper is based on the only previous algorithm that is able to identify motifs composed of any number of boxes -structured motifs [6]. There are two central problems concerning motifs in sequences: localization and extraction [13].…”

Section: Introductionmentioning

confidence: 99%

“…The goal of the structured motif localization is to find the positions in a sequence of the occurrences of a given structured motif [14]. The structured motif extraction aims to identify de novo structured motifs in a set of input sequences [6]. In this paper, we address the structured motif extraction problem.…”

Section: Introductionmentioning

confidence: 99%

“…Like the previous algorithm [6], this new exact algorithm infers all structured motifs which match a minimum number of input sequences. The latter number is called quorum.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Efficient Extraction of Structured Motifs Using Box-Links

Carvalho¹,

Freitas²,

Oliveira³

et al. 2004

String Processing and Information Retrieval

Self Cite

View full text Add to dashboard Cite

In this paper, we propose a new algorithm for the extraction of repeated motifs that may represent binding-site consensi in genomic sequences. In particular, the algorithm extracts structured motifs, which we define as a collection of highly conserved motifs with pre-specified sizes and spacings between them. This type of motifs is highly relevant in the search for gene regulatory mechanisms since promoter models can be effectively represented by structured motifs.The algorithm uses factor trees, a variation of suffix trees, and a new data structure, called box-links, to store the information about conserved regions that repeat often in the dataset sequences. The complexity analysis shows a gain over previous algorithms that is exponential on the spacings between boxes.The application of a prototype implementation of this algorithm to biologically relevant datasets shows the ability of the method to extract relevant consensi. The experimental results also show that this algorithm is much faster than existing ones, sometimes by more than two orders of magnitude.

show abstract

Section: Preliminariesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…Like the previous algorithm [6], this new exact algorithm infers all structured motifs which match a minimum number of input sequences. The latter number is called quorum.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Extraction of Structured Motifs Using Box-Links

Carvalho¹,

Freitas²,

Oliveira³

et al. 2004

String Processing and Information Retrieval

Self Cite

View full text Add to dashboard Cite

show abstract

“…The efficiency of the filter relies on an original data structure, the bi-factor array, that is also introduced in this paper, and on a labelling of the seeds similar to the one employed in [8]. This new data structure can be used to speed up other tasks such as the inference of structured motifs [18] or for improving other filters [14].…”

Section: Introductionmentioning

confidence: 99%

Lossless Filter for Finding Long Multiple Approximate Repetitions Using a New Data Structure, the Bi-factor Array

Peterlongo

Pisanti

Boyer

et al. 2005

String Processing and Information Retrieval

Self Cite

View full text Add to dashboard Cite

Abstract. Similarity search in texts, notably biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the resolution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two sequences or occurring twice in the same sequence. In this paper, we present an algorithm called NIMBUS for filtering sequences prior to finding repetitions occurring more than twice in a sequence or in more than two sequences. NIMBUS uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with NIMBUS a data set where one wants to find functional elements using a multiple local alignment tool such as GLAM ([7]), the overall execution time can be reduced from 10 hours to 6 minutes while obtaining exactly the same results.

show abstract

Algorithmic Issues in the Analysis of Chip‐Seq Data

Zambelli

Pavesi

2010

Algorithms in Computational Molecular Biology

View full text Add to dashboard Cite

Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification

Cited by 207 publications

References 28 publications

Efficient Extraction of Structured Motifs Using Box-Links

Efficient Extraction of Structured Motifs Using Box-Links

Lossless Filter for Finding Long Multiple Approximate Repetitions Using a New Data Structure, the Bi-factor Array

Algorithmic Issues in the Analysis of Chip‐Seq Data

Contact Info

Product

Resources

About