2008
DOI: 10.1186/1471-2105-9-s6-s15
|View full text |Cite
|
Sign up to set email alerts
|

An improved string composition method for sequence comparison

Abstract: BackgroundHistorically, two categories of computational algorithms (alignment-based and alignment-free) have been applied to sequence comparison–one of the most fundamental issues in bioinformatics. Multiple sequence alignment, although dominantly used by biologists, possesses both fundamental as well as computational limitations. Consequently, alignment-free methods have been explored as important alternatives in estimating sequence similarity. Of the alignment-free methods, the string composition vector (CV)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
47
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 38 publications
(47 citation statements)
references
References 22 publications
0
47
0
Order By: Relevance
“…Lu et al (2008) have found two problems associated with composition vector methods: (a) there is a positive correlation between the observed count c(w k,1 … w k,k ) and the estimated expected count c 0 (w k,1 … w k,k ), and (b) a square root needs to be applied to the denominator. Without such an operation, the normalized count tends to be over-standardized.…”
Section: Word Statisticsmentioning
confidence: 99%
See 3 more Smart Citations
“…Lu et al (2008) have found two problems associated with composition vector methods: (a) there is a positive correlation between the observed count c(w k,1 … w k,k ) and the estimated expected count c 0 (w k,1 … w k,k ), and (b) a square root needs to be applied to the denominator. Without such an operation, the normalized count tends to be over-standardized.…”
Section: Word Statisticsmentioning
confidence: 99%
“…This enables building more complex, biologically realistic models with large numbers of parameters, such as Markov model (Pham and Zuegg 2004;Hao and Qi 2004;Wu et al 2006), mix model such as Markov model plus k-word distributions Kantorovitz et al 2007), and Bernoulli model assuming a known word distribution (Lu et al 2008). Although the more complex models in biological sequence comparison are general improvements over the traditional word-based models (Blaisdell 1986;Wu et al 1997Wu et al , 2001Stuart et al 2002), some problems in developing statistical models and estimating the parameters of the complex models have impeded the development and adoption of these or other more complex models.…”
Section: Introductionmentioning
confidence: 98%
See 2 more Smart Citations
“…This trend will probably continue with new transformations that emerge from its integrative, quantitative and impressive nature (Fuchs, 2002). This growing proliferation of data from biological sequences made possible the development of many algorithms for the analysis and mining of knowledge (Lu et al, 2008).…”
Section: Introductionmentioning
confidence: 99%