2001
DOI: 10.1111/j.0006-341x.2001.00441.x
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Measures of DNA Sequence Dissimilarity under Markov Chain Models of Base Composition

Abstract: In molecular biology, the issue of quantifying the similarity between two biological sequences is very important. Past research has shown that word-based search tools are computationally efficient and can find some new functional similarities or dissimilarities invisible to other algorithms like FASTA. Recently, under the independent model of base composition, Wu, Burke, and Davison (1997, Biometrics 53, 1431 1439) characterized a family of word-based dissimilarity measures that defined distance between two se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
78
0

Year Published

2001
2001
2011
2011

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 109 publications
(78 citation statements)
references
References 15 publications
0
78
0
Order By: Relevance
“…Whereas the lack of significant differences among Mahalanobisdistance-based methods can be partly attributed to the modest number of iterations used in the simulations (100) and the conservative Tukey criterion based on 18 means, we believe that only strong differences were of interest for attempting any general conclusions from these simulations. It is particularly interesting that the Euclidean distance-based methodd E (β i ,β j ) did not perform as well as our method, because in many multivariate applications a Euclidean distance is proposed as a computationally less expensive approximation to a Mahalanobis distance (e.g., Wu et al 2001).…”
Section: Discussionmentioning
confidence: 91%
“…Whereas the lack of significant differences among Mahalanobisdistance-based methods can be partly attributed to the modest number of iterations used in the simulations (100) and the conservative Tukey criterion based on 18 means, we believe that only strong differences were of interest for attempting any general conclusions from these simulations. It is particularly interesting that the Euclidean distance-based methodd E (β i ,β j ) did not perform as well as our method, because in many multivariate applications a Euclidean distance is proposed as a computationally less expensive approximation to a Mahalanobis distance (e.g., Wu et al 2001).…”
Section: Discussionmentioning
confidence: 91%
“…The most commonly used measures are Euclidean distance, d 2 distance (a weighted Euclidean distance), Mahalanobis distance and Kullback-Leibler discrepancy (KLD). Since Wu, Hsieh, and Li (2001) find in their experiments that KLD provides good results while it still can be computed as fast as Euclidean distance, it is also used here. Since KLD becomes −∞ for counts of zero, we add one to all counts which conceptually means that we start building the EMM with a prior that all triplets have the equal occurrence probability (see Wu et al 2001).…”
Section: Genetic Sequence Analysismentioning
confidence: 99%
“…Sensitivity and selectivity were computed to evaluate and compare the performance of the proposed models with other distance measures [33]. Sensitivity is expressed by the number of A. testaceum related sequences found among the first closest five library sequences.…”
Section: Similarity Searchmentioning
confidence: 99%