2014
DOI: 10.1093/nar/gku398
|View full text |Cite
|
Sign up to set email alerts
|

Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches

Abstract: In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing ‘don't care’ or ‘wildcard’ symbols at certain pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
49
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
2
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 65 publications
(49 citation statements)
references
References 22 publications
0
49
0
Order By: Relevance
“…For large-scale comparisons of genome-scale sequences, especially highly diverse ones, alignment-free methods of phylogeny construction have been increasingly used in the past few years23242526. There are two categories of alignment-free methods for phylogenomic analysis: one based on the statistics of word frequency, the other on Kolmogorov complexity and chaos theory27.…”
mentioning
confidence: 99%
“…For large-scale comparisons of genome-scale sequences, especially highly diverse ones, alignment-free methods of phylogeny construction have been increasingly used in the past few years23242526. There are two categories of alignment-free methods for phylogenomic analysis: one based on the statistics of word frequency, the other on Kolmogorov complexity and chaos theory27.…”
mentioning
confidence: 99%
“…Although, The problem of time-shift can be solved based on FFT, frequency domain analysis can't solve the interrelationship on timing sequence accurately. Therefore, after steps of screening for j B , pairwise points should be checked in time domain so as to improve recognition accuracy based on high-level semantic [7]. In experiment, k is set to 3~8.…”
Section: Vectors Extracting and Screeningmentioning
confidence: 99%
“…Some sequence matches are also missed due to insertion and deletions between key residue positions of a novel protein. In such cases, direct methods of functional annotation, which rely on scanning a sequence through sliding windows or use global summary of sequence properties such as amino acid composition have proved useful . Protein‐function annotation on a large scale is done by use of Gene Ontologies .…”
Section: Introductionmentioning
confidence: 99%
“…In such cases, direct methods of functional annotation, which rely on scanning a sequence through sliding windows or use global summary of sequence properties such as amino acid composition have proved useful. [4][5][6][7][8] Protein-function annotation on a large scale is done by use of Gene Ontologies. 4 However, focusing on individual, well understood biological functions and annotating specific Biological functions gives much more power to a predictive method, as the annotations can incorporate knowledge specifically relevant for that system.…”
Section: Introductionmentioning
confidence: 99%