2017
DOI: 10.1186/s13015-017-0118-8
|View full text |Cite
|
Sign up to set email alerts
|

Phylogeny reconstruction based on the length distribution of k-mismatch common substrings

Abstract: BackgroundVarious approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487–1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings.ResultsIn this paper, we study the length distribution of k-mismatch common substrings between two sequences. We show that the number of substitution… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3
2

Relationship

5
4

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 35 publications
0
14
0
Order By: Relevance
“…A number of alignment-free methods have been proposed in recent years that estimate the nucleotide match probability p in an (unknown) alignment of two DNA sequences from the number N k of word or k-mer matches for a fixed word length k. This can be done by comparing N k to the total number of words in the compared sequences [39] or -equivalently -to the length of the sequences [35]. A certain draw-back of these approaches is that they assume that the compared sequences are homologous to each other over their entire length.…”
Section: Availability and Future Directionsmentioning
confidence: 99%
See 1 more Smart Citation
“…A number of alignment-free methods have been proposed in recent years that estimate the nucleotide match probability p in an (unknown) alignment of two DNA sequences from the number N k of word or k-mer matches for a fixed word length k. This can be done by comparing N k to the total number of words in the compared sequences [39] or -equivalently -to the length of the sequences [35]. A certain draw-back of these approaches is that they assume that the compared sequences are homologous to each other over their entire length.…”
Section: Availability and Future Directionsmentioning
confidence: 99%
“…Most of these approaches calculate heuristic measures of sequence (dis-)similarity that are difficult to interpret. At the same time, alignment-free methods have been proposed that can accurately estimate phylogenetic distances between sequences based on stochastic models of DNA or protein evolution, using the length of common substrings [22,35] or so-called micro alignments [54,21,30,29,12].…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, a large number of alignment-free approaches to phylogeny reconstruction have been developed and applied, since these methods are much faster than traditional, alignment-based phylogenetic methods, see [51,39,3,25] for recent review papers and [50] for a systematic evaluation of alignment-free software tools. Most alignment-free approaches are based on k-mer statistics [21,44,7,48,17], but there are also approaches based on the length of common substrings [47,8,27,37,32,46], on word or spaced-word matches [38,33,35,34,1,41] or on so-called micro-alignments [49,20,29,28]. As has been mentioned by various authors, an additional advantage of many alignment-free methods is that they can be applied not only to complete genome sequences, but also to unassembled reads.…”
Section: Introductionmentioning
confidence: 99%
“…To this end, the program used the average length of common substrings between the compared sequences. Later, we proposed an approach to estimate phylogenetic distances based on the length distribution of k-mismatch common substrings [53].…”
Section: Introductionmentioning
confidence: 99%