2009
DOI: 10.1186/1748-7188-4-14
|View full text |Cite
|
Sign up to set email alerts
|

iTriplet, a rule-based nucleic acid sequence motif finder

Abstract: BackgroundWith the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides) motifs still remains an unmet challenge as high degeneracy will d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
34
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(35 citation statements)
references
References 37 publications
1
34
0
Order By: Relevance
“…6 shows sequence logos 34 of the predicted motifs, which graphically shows the degree of motif conservation measured by relative entropy. Note that, many existing recognition algorithms 1, 3-5, 18, 29 also test their validity on these data sets. Since all of these algorithms (including PairMotif+) show a good performance on these data sets, here we do not make comparisons.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…6 shows sequence logos 34 of the predicted motifs, which graphically shows the degree of motif conservation measured by relative entropy. Note that, many existing recognition algorithms 1, 3-5, 18, 29 also test their validity on these data sets. Since all of these algorithms (including PairMotif+) show a good performance on these data sets, here we do not make comparisons.…”
Section: Resultsmentioning
confidence: 99%
“…For a given motif length l , the larger the degenerate positions d , the more difficult it is to identify the planted ( l , d ) motif in input sequences. Specifically, some researchers 3, 4 use 2 d -neighborhood probability (i.e., the probability that the Hamming distance between two random l -mers is not larger than 2 d ) to measure the difficulty of solving different PMS instances, since it is a good indicator to reflect the degree of degeneracy of PMS instances 3.…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, they generate candidate motifs by using all possible h-tuple T = (x), Xz ... Xh) composed of h 1-length strings coming from h distinct reference sequences. In existing pattern-driven PMS algorithms, h is 1 for PMSP [5] and PMSPrune [6]; h is 2 for PairMotif [7], qPMS7 [8] and TravStrR [9]; h is 3 for iTriplet [10] and PMS5 [11]; for PMS8 [12] and qPMS9 [13], h is greater than or equal to 3 and self adaptive in dealing with different PMS problem instances. Moreover, these algorithms use k = t -q + h reference sequences to generate candidate motifs, ensuring that there exists at least one h-tuple T so that each I-length string in T is a moti f instance.…”
Section: Introductionmentioning
confidence: 99%
“…For the exact algorithms proposed in earlier years, such as WINNOWER [3], their search space is composed of (n -I + 1 Y possible alignments of motif instances. Tn recent years, the exact algorithms verifY all the I-length patterns in the O(III ' ) search space, and they are called the pattern-driven PMS algorithms [5][6][7][8][9][10][11][12][13].…”
Section: Introductionmentioning
confidence: 99%
“…They take all string patterns of length l over S as candidate motifs, and output the patterns that can span all input sequences. Typical patterndriven algorithms aim to reduce candidate motifs through various means [11], [12], [13], [14], [15], [16], [17], [18]. Some other pattern-driven algorithms represent the input sequences as a suffix tree to accelerate the verification of candidate motifs [19], [20], [21].…”
Section: Introductionmentioning
confidence: 99%