2017
DOI: 10.1007/978-3-319-71249-9_22
|View full text |Cite
|
Sign up to set email alerts
|

GaKCo: A Fast Gapped k-mer String Kernel Using Counting

Abstract: Abstract. String Kernel (SK) techniques, especially those using gapped k-mers as features (gk), have obtained great success in classifying sequences like DNA, protein, and text. However, the state-of-the-art gk-SK runs extremely slow when we increase the dictionary size (Σ) or allow more mismatches (M ). This is because current gk-SK uses a trie-based algorithm to calculate cooccurrence of mismatched substrings resulting in a time cost proportional to O(Σ M ). We propose a fast algorithm for calculating Gapped… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 24 publications
(14 citation statements)
references
References 19 publications
0
14
0
Order By: Relevance
“…BaselinesWe compare the prediction accuracy and efficiency of FastSK with 3 state-of-the-art string kernel baselines. For DNA and protein data, we baseline against gkmSVM-2.0 [8] and GaKCo [27]. For an NLP string kernel baseline, we use the Blended Spectrum Kernel [12,11], as it has recently achieved strong results in natural language processing.…”
Section: Experimental Setup and Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…BaselinesWe compare the prediction accuracy and efficiency of FastSK with 3 state-of-the-art string kernel baselines. For DNA and protein data, we baseline against gkmSVM-2.0 [8] and GaKCo [27]. For an NLP string kernel baseline, we use the Blended Spectrum Kernel [12,11], as it has recently achieved strong results in natural language processing.…”
Section: Experimental Setup and Resultsmentioning
confidence: 99%
“…FastSK directly counts the gapped k-mers shared between sequences, previous works (e.g. [7,8,18,19,27]) indirectly compute the kernel function by inferring the counts from a set of mismatch statistics. These methods take inspiration from [17], which uses the notion of a mismatch neighborhood to efficiently compute the (k, m)-mismatch kernel.…”
Section: Connecting To Related Work Mismatch Statistic-based String Kmentioning
confidence: 99%
See 3 more Smart Citations