The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances

Röhling, Sophie; Linne, Alexander; Schellhorn, Jendrik; Hosseini, Morteza; Dencker, Thomas; Morgenstern, Burkhard

doi:10.1371/journal.pone.0228070

Cited by 41 publications

(29 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From Figure 5, it can be seen that several other popular methods have RF distances very close to K-Phylo. It should be noted that results on this dataset are surprising as methods performing well on the rest of the datasets performed poorly here and vice versa as claimed in [15]. The tree estimated by K-Phylo on this dataset and the benchmark tree are available in Figure S1 of Supplementary Data.…”

Section: Yersinia Strainsmentioning

confidence: 70%

“…The limitation of this selection process is that it is solely dependent on sequence length and does not take into account resemblances between sequences. Another mechanism in [15] explains a method of finding a range of feasible values of k as a Figure 2: First, different k-mers are listed from the input sequences. Then separate binary matrices from all these k-mer counts are produced.…”

Section: Finding An Appropriate K-mer Lengthmentioning

confidence: 99%

See 1 more Smart Citation

An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

Zahin

Abrar

Rahman

et al. 2019

Preprint

View full text Add to dashboard Cite

Phylogenetic analysis i.e. construction of an accurate phylogenetic tree from genomic sequences of a set of species is one of the main challenges in bioinformatics. The popular approaches to this require aligning each pair of sequences to calculate pairwise distances or aligning all the sequences to construct a multiple sequence alignment. The computational complexity and difficulties in getting accurate alignments have led to development of alignment-free methods to estimate phylogenies. However, the alignment free approaches focus on computing distances between species and do not utilize statistical approaches for phylogeny estimation. Herein, we present a simple alignment free method for phylogeny construction based on contiguous sub-sequences of length k termed k-mers. The presence or absence of these k-mers are used to construct a phylogeny using a maximum likelihood approach. The results suggest our method is competitive with other alignment-free approaches, while outperforming them in some cases.

show abstract

Section: Yersinia Strainsmentioning

confidence: 70%

Section: Finding An Appropriate K-mer Lengthmentioning

confidence: 99%

An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

Zahin

Abrar

Rahman

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…One of the fundamental tasks of genomics is to compare these sequences for phylogenetic analysis. Several methods are available to compare genetic sequences, either through sequence alignment [1,2] or alignment-free approach [3,4,5,6,7]. Due to large size of genomic data, the sequence alignment approaches are not time and memory e cient and also have some shortcomings [8,9].…”

Section: Introductionmentioning

confidence: 99%

k-mer proximity index for phylogeny comparison of SARS-CoV-2 with other pathogens

Pratibha

Shaju

Kamal

et al. 2020

Preprint

View full text Add to dashboard Cite

We developed a compact and computationally inexpensive method for in-silico comparison of nucleotide sequences at a macro level using subtraction-percentage plots (SP-plots) of a modified chaos game representation (CGR). Analyzing these plots, we defined the k-mer proximity index quantifying the differences between SARS-CoV-2 and other pathogens’ genome sequences. We categorized 31 pathogens, on the basis of their proximity to SARS-CoV-2, in four groups to possibly plan a treatment strategy for Covid-19.

show abstract

“…In recent years, a large number of alignment-free approaches to phylogeny reconstruction have been developed and applied, since these methods are much faster than traditional, alignment-based phylogenetic methods, see [50,39,3,25] for recent review papers. Most alignment-free approaches are based on k-mer statistics [21,44,7,48,17], but there are also approaches based on the length of common substrings [47,8,27,37,32,46], on word or spaced-word matches [38,33,35,34,1,41] or on so-called micro-alignments [49,20,29,28]. As has been mentioned by various authors, an additional advantage of many alignment-free methods is that they can be applied not only to complete genome sequences, but also to unassembled reads.…”

Section: Introductionmentioning

confidence: 99%

Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage

Lau

Leimeister

Dörrer

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

In many fields of biomedical research, it is important to estimate phylogenetic distances between taxa based on low-coverage sequencing reads. Major applications are, for example, phylogeny reconstruction, species identification from small sequencing samples, or bacterial strain typing in medical diagnostics. Herein, we adapt our previously developed software program Filtered Spaced-Word Matches (FSWM) for alignment-free phylogeny reconstruction to work on unassembled reads; we call this implementation Read-SpaM. Test runs on simulated reads from bacterial genomes show that our approach can estimate phylogenetic distances with high accuracy, even for large evolutionary distances and for very low sequencing coverage.

show abstract

The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances

Cited by 41 publications

References 63 publications

An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

k-mer proximity index for phylogeny comparison of SARS-CoV-2 with other pathogens

Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage

Contact Info

Product

Resources

About