2019
DOI: 10.1101/2019.12.16.878314
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sequence Comparison without Alignment: TheSpaMapproaches

Abstract: Sequence alignment is at the heart of DNA and protein sequence analysis. For the data volumes that are nowadays produced by massively parallel sequencing technologies, however, pairwise and multiple alignment methods have become too slow for many data-analysis tasks. Therefore, fast alignment-free approaches to sequence comparison have become popular in recent years. Most of these approaches are based on word frequencies, for words of a fixed length, or on word-matching statistics. Other approaches are based o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
2

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 86 publications
0
6
0
Order By: Relevance
“…Here, a spaced-word match is a pair of words from two sequences that are identical at certain positions, specified by a pre-defined binary pattern of match and don't-care positions, see [39] for a short review of alignment-free approaches based on spaced-word matches.…”
Section: Data Availability Statementmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, a spaced-word match is a pair of words from two sequences that are identical at certain positions, specified by a pre-defined binary pattern of match and don't-care positions, see [39] for a short review of alignment-free approaches based on spaced-word matches.…”
Section: Data Availability Statementmentioning
confidence: 99%
“…Skmer [37] is a further improvement of this approach. In a previous paper, we proposed another way to infer evolutionary distances between DNA sequences based on the number of word matches between them, and we generalized this to so-called spaced-word matches [38].Here, a spaced-word match is a pair of words from two sequences that are identical at certain positions, specified by a pre-defined binary pattern of match and don't-care positions, see [39] for a short review of alignment-free approaches based on spaced-word matches.The distance function proposed in [38] is now used by default in the program Spaced [40]. Theoretically, this distance measure is based on a simple model of molecular evolution without insertions or deletions.…”
mentioning
confidence: 99%
“…Furthermore, for a substitution matrix assigning a score to any two symbols of the nucleotide alphabet A, we define the score of a spaced word match as the sum of all substitution scores of nucleotide pairs aligned to each other at the don't care positions of P . Spaced-word matches -called spaced seeds in this context -have been originally introduced in sequence-database searching [24]; later they were applied in alignment-free sequence comparison, to estimate phylogenetic distances between DNA and protein sequences [30,21,20,32], see [29] for a review.…”
Section: Definitionsmentioning
confidence: 99%
“…These methods use different techniques, such as dynamic programming, pairwise comparison, and heuristic methods associated with similarity metrics between the nucleotide sequences. However, these methods have some limitations: (1) They require some prior knowledge of the reference sequence; (2) they admit that there is contiguity between homologous regions; (3) they are computationally expensive for long sequences (for example, for only two sequences of length N, we have (2N)!/ ) possible gapped sequences and a time complexity of the length of the inputs and/or their products; and (4) they are not very efficient for specimens that have a high rate of genomic mutation, such as viruses [ 16 , 17 , 18 , 19 , 20 ].…”
Section: Introductionmentioning
confidence: 99%