1986
DOI: 10.1073/pnas.83.14.5155
|View full text |Cite
|
Sign up to set email alerts
|

A measure of the similarity of sets of sequences not requiring sequence alignment.

Abstract: Determination of first-and second-order Markov chain homogeneity of sets of nuclear eukaryotic DNA sequences, both coding and noncoding, finds similarities imperceptible to the standard Needleman-Wunsch base matching or dot-matrix algorithms. These measures of the similarities of the distributions of adjacent pairs or triplets are in agreement with accepted evolutionary-tree topologies. Hierarchical clustering of the distributions of doublets of 30 miscellaneous coding sequences gives clusters in reasonable ag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
198
0
1

Year Published

2008
2008
2014
2014

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 311 publications
(201 citation statements)
references
References 46 publications
2
198
0
1
Order By: Relevance
“…More will be said about these methods, but they have been applied to gene and genome sequences with varying levels of success (e.g. Blaisdell 1986Blaisdell , 1989Höhl et al 2006;Ferragina et al 2007). They have not yet been applied to ESS.…”
Section: Introductionmentioning
confidence: 99%
“…More will be said about these methods, but they have been applied to gene and genome sequences with varying levels of success (e.g. Blaisdell 1986Blaisdell , 1989Höhl et al 2006;Ferragina et al 2007). They have not yet been applied to ESS.…”
Section: Introductionmentioning
confidence: 99%
“…In the mid-1980s, n-grams were adapted for use in comparing gene sequences [24], an approach typically called "alignmentfree". In this approach, similarity among individual sequences is gauged by comparing the frequency of all n-grams [14,29].…”
Section: Introductionmentioning
confidence: 99%
“…Nonetheless, the use of n-gram-statistics is computationally less expensive, and therefore remains relevant to both linguistics and genomics. The application of n-gram statistics in genomics may be particularly useful for genomics, as the structure of the genome is possibly a finite state.In the mid-1980s, n-grams were adapted for use in comparing gene sequences [24], an approach typically called "alignmentfree". In this approach, similarity among individual sequences is gauged by comparing the frequency of all n-grams [14,29].…”
mentioning
confidence: 99%
“…But MSA is not without limitations. MSA, based on heuristic algorithms 62 , provides alignment scores for which relevance to homology can be difficult to assess statistically 63 .…”
Section: Research Problemmentioning
confidence: 99%
“…Dot-matrix methods were prefigured by Walter Fitch 63 and others prior to their formal description by Gibbs and MacIntyre 64 . Adrian Gibbs and colleagues considered the dot-matrix to subsume the sliding-window approach 62 , and to be "similar in principle" 64 to a method explained by Saul…”
Section: Oligonucleotides and K-mersmentioning
confidence: 99%