2016
DOI: 10.1089/cmb.2015.0217
|View full text |Cite
|
Sign up to set email alerts
|

ALFRED: A Practical Method for Alignment-Free Distance Computation

Abstract: Alignment-free approaches are gaining persistent interest in many sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, especially for large-scale sequence datasets. Besides the widely used k-mer methods, the average common substring (ACS) approach has emerged to be one of the well-known alignment-free approaches. Two recent works further generalize this ACS approach by allowing a bounded number k of mismatches in the common substrings, relying on approximatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2017
2017
2018
2018

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 26 publications
(18 citation statements)
references
References 23 publications
0
18
0
Order By: Relevance
“…Match-length approaches, in contrast, estimate phylogenetic distances from the length of substring matches between two sequences (Comin and Verzotto, 2012; Haubold et al , 2005; Thankachan et al , 2016; Ulitsky et al , 2006). Since the length of exact substring matches between two homologous sequence regions depends on the mismatch frequency, substitution rates can be estimated, in turn, from the average length of exact common substrings (Domazet-Loso and Haubold, 2009).…”
Section: Introductionmentioning
confidence: 99%
“…Match-length approaches, in contrast, estimate phylogenetic distances from the length of substring matches between two sequences (Comin and Verzotto, 2012; Haubold et al , 2005; Thankachan et al , 2016; Ulitsky et al , 2006). Since the length of exact substring matches between two homologous sequence regions depends on the mismatch frequency, substitution rates can be estimated, in turn, from the average length of exact common substrings (Domazet-Loso and Haubold, 2009).…”
Section: Introductionmentioning
confidence: 99%
“…The algorithm is much more complicated than the original ACS method and even the k-ACS approximation by [14]. Moreover the practical variant of this algorithm can get quite slow for even moderately large values of k due to its exponential dependency on k [21]. However, this algorithm has its merit as the first sub-quadratic time algorithm for exact k-ACS computation for any positive integer k .…”
Section: Introductionmentioning
confidence: 99%
“…For k = 0 kmacs exactly computes the ACS. Other algorithms besides kmacs [33,29] have been designed to compute alignment-free distances based on longest matches with mismatches, but for the special case k = 0 kmacs 332 Table 3. The first collection contains 932 genomes, the second one contains 4, 983 genomes.…”
Section: Preliminary Experimentsmentioning
confidence: 99%
“…To keep pace with this, several algorithms that go beyond the concept of sequence alignment have been developed, called alignment-free [35]. Alignment-free approaches have been explored in several large-scale biological applications ranging, for instance, from DNA sequence comparison [12,28,14,19,27] to whole-genome phylogeny construction [34,15,13,23,33] and the classification of protein sequences [14]. Most alignment-free approaches above mentioned require, each with its own specific approach and with the use of appropriate data structures, the computation of statistics of the sequences of the analyzed collections.…”
Section: Introductionmentioning
confidence: 99%