2004
DOI: 10.1093/bioinformatics/btg431
|View full text |Cite
|
Sign up to set email alerts
|

Mismatch string kernels for discriminative protein classification

Abstract: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Ma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
539
1
4

Year Published

2006
2006
2012
2012

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 568 publications
(548 citation statements)
references
References 31 publications
4
539
1
4
Order By: Relevance
“…It would also be interesting to explore the application of some other types of string kernels on text clustering. Of particular interest would be the mismatch [9] kernels especially on raw (i.e. non pre-processed) text data.…”
Section: Discussionmentioning
confidence: 99%
“…It would also be interesting to explore the application of some other types of string kernels on text clustering. Of particular interest would be the mismatch [9] kernels especially on raw (i.e. non pre-processed) text data.…”
Section: Discussionmentioning
confidence: 99%
“…Among the discriminative approaches, string kernel-based machine learning methods provide some of the most accurate results [27,19,16,28].…”
Section: String Kernelsmentioning
confidence: 99%
“…More general, the so-called substring kernels [27] measure similarity between sequences based on common co-occurrence of exact sub-patterns (e.g., substrings). Inexact comparison, which is critical for effective matching (similarity evaluation) between text documents due to naturally occurring word substitutions, insertions, or deletions, is typically achieved by using different families of mismatch [19]. The mismatch kernel considers word (or character) n-gram counts with inexact matching of word (or character) n-grams.…”
Section: String Kernelsmentioning
confidence: 99%
See 1 more Smart Citation
“…The spectrum kernel was used to detect remote homology detection [13] [14]. The input space X consists of all finite length sequences of characters from an alphabet A of size |A| = l (l = 20 for amino acids).…”
Section: Kernel Functionmentioning
confidence: 99%