2011
DOI: 10.2202/1544-6115.1724
|View full text |Cite
|
Sign up to set email alerts
|

Alignment-free Sequence Comparison for Biologically Realistic Sequences of Moderate Length

Abstract: The D 2 statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as the variability in D 2 may be dominated by the terms that reflect the noise in each of the single sequences only. We examine the extent of the problem and the effectiveness of overcoming it by using two mean-centred variants of this statistic, D 2 * and D 2 c. We con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 19 publications
0
9
0
Order By: Relevance
“…For instance, in previous studies of a database of cis-regulatory modelled as a set of i.i.d. sequences was successfully studied using the D 2 statistics simply by imposing PBCs on the sequences prior to calculating the D 2 (Forêt et al, 2009a;Burden et al, 2012).…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…For instance, in previous studies of a database of cis-regulatory modelled as a set of i.i.d. sequences was successfully studied using the D 2 statistics simply by imposing PBCs on the sequences prior to calculating the D 2 (Forêt et al, 2009a;Burden et al, 2012).…”
Section: Discussionmentioning
confidence: 99%
“…In biological aplications of the analogous results for i.i.d. sequences (Forêt et al, 2009a;Burden et al, 2012) we have found generally that the PBCs are not an impediment, as they can simply be imposed on the sequences prior to calculating D 2 without without seriously affecting its efficacy as a measure of sequence similarity.…”
Section: Introductionmentioning
confidence: 89%
See 2 more Smart Citations
“…Needleman to Fitch 65 in 1965 and later introduced as the first algorithm for full-length sequence alignment 66 . Applied to molecular sequences, all these approaches find regions of local identity (or similarity).…”
Section: Oligonucleotides and K-mersmentioning
confidence: 99%