2000
DOI: 10.1016/s0097-8485(00)80006-6
|View full text |Cite
|
Sign up to set email alerts
|

Sequence complexity for biological sequence analysis

Abstract: A new statistical model for DNA considers a sequence to be a mixture of regions with little structure and regions that are approximate repeats of other subsequences, i.e. instances of repeats do not need to match each other exactly. Both forward-and reverse-complementary repeats are allowed. The model has a small number of parameters which are fitted to the data. In general there are many explanations for a given sequence and how to compute the total probability of the data given the model is shown. Computer a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2000
2000
2016
2016

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 33 publications
(14 citation statements)
references
References 31 publications
0
14
0
Order By: Relevance
“…Moreover, there is also the possibility to use the reverse complement repeats, where they can also be exact or approximate copies. An example can be seen in Figure 3, with information profiles assessment [35,36], where the zones of low complexity are related with similarity, and hence, exact or approximate repetitions seen in other parts of the sequence, as in the first plot (real DNA FASTQ sequence). The second plot shows a simulated sequence with absence of repeats, and hence, the zones of low complexity are also absent.…”
Section: Resultsmentioning
confidence: 99%
“…Moreover, there is also the possibility to use the reverse complement repeats, where they can also be exact or approximate copies. An example can be seen in Figure 3, with information profiles assessment [35,36], where the zones of low complexity are related with similarity, and hence, exact or approximate repetitions seen in other parts of the sequence, as in the first plot (real DNA FASTQ sequence). The second plot shows a simulated sequence with absence of repeats, and hence, the zones of low complexity are also absent.…”
Section: Resultsmentioning
confidence: 99%
“…Nevertheless, Ziv and Lempel (1976) devised an eminently computable implementation of Solomonoff-Kolmogorov-Chaitin complexity which draws upon the occurrence of direct repeats in a sequence. More advanced measures of similar type have recently been proposed which take additional linguistic features into account, including palindromes and inexact repeats (Grumbach and Taxi 1994;Allison et al 2000). Gusev et al (1999) have generalized the Lempel-Ziv approach to allow for the occurrence of all kinds of biologically relevant repeats in a DNA sequence, including direct and inverted repeats and inversions thereof.…”
Section: Complexity Analysismentioning
confidence: 99%
“…Chen et al [ 2 ] showed that compressibility is a good measurement of relatedness between sequences and can be effectively used in sequence alignment and evolutionary tree construction. According to Allison et al [ 3 ] compression of DNA sequences also results in the intelligent analysis of these sequences. Compression also plays an important role in efficient sequence classification [ 4 ].…”
Section: Introductionmentioning
confidence: 99%