Functional Representation of Enzymes by Specific Peptides

Kunik, Vered; Meroz, Yasmine; Solan, Zach; Sandbank, Ben; Weingart, Uri; Ruppin, Eytan; Horn, D.

doi:10.1371/journal.pcbi.0030167

Cited by 19 publications

(21 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, cases of remote homology which may be captured by DME could have been missed by BLAST-based assignments, as was demonstrated by [6] and by [8]. The SP-based search has two other advantages over BLAST: it is conceptually simpler, relying only on a look-up table, and it points to specific locations on the queried protein which may be relevant to the expected catalytic function of that enzyme.…”

Section: Discussionmentioning

confidence: 99%

Data mining of enzymes using specific peptides

2009

Self Cite

View full text Add to dashboard Cite

BackgroundPredicting the function of a protein from its sequence is a long-standing challenge of bioinformatic research, typically addressed using either sequence-similarity or sequence-motifs. We employ the novel motif method that consists of Specific Peptides (SPs) that are unique to specific branches of the Enzyme Commission (EC) functional classification. We devise the Data Mining of Enzymes (DME) methodology that allows for searching SPs on arbitrary proteins, determining from its sequence whether a protein is an enzyme and what the enzyme's EC classification is.ResultsWe extract novel SP sets from Swiss-Prot enzyme data. Using a training set of July 2006, and test sets of July 2008, we find that the predictive power of SPs, both for true-positives (enzymes) and true-negatives (non-enzymes), depends on the coverage length of all SP matches (the number of amino-acids matched on the protein sequence). DME is quite different from BLAST. Comparing the two on an enzyme test set of July 2008, we find that DME has lower recall. On the other hand, DME can provide predictions for proteins regarded by BLAST as having low homologies with known enzymes, thus supplying complementary information. We test our method on a set of proteins belonging to 10 bacteria, dated July 2008, establishing the usefulness of the coverage-length cutoff to determine true-negatives. Moreover, sifting through our predictions we find that some of them have been substantiated by Swiss-Prot annotations by July 2009. Finally we extract, for production purposes, a novel SP set trained on all Swiss-Prot enzymes as of July 2009. This new set increases considerably the recall of DME. The new SP set is being applied to three metagenomes: Sargasso Sea with over 1,000,000 proteins, producing predictions of over 220,000 enzymes, and two human gut metagenomes. The outcome of these analyses can be characterized by the enzymatic profile of the metagenomes, describing the relative numbers of enzymes observed for different EC categories.ConclusionsEmploying SPs for predicting enzymatic activity of proteins works well once one utilizes coverage-length criteria. In our analysis, L ≥ 7 has led to highly accurate results.

show abstract

Section: Discussionmentioning

confidence: 99%

Data mining of enzymes using specific peptides

2009

Self Cite

View full text Add to dashboard Cite

show abstract

“…The study by Kunik et al (2007) has answered this question affirmatively. It has applied MEX to a large set of proteins known as enzymes, derived from the Swiss-Prot database (release 48.3 in Oct 2005).…”

Section: Specific Peptidesmentioning

confidence: 91%

“…Such a comparison, based on linear SVM (Vapnik 1995;Scholkopf 1997) training and testing on the class of oxidoreductases leads to the conclusion that MEX motifs do better than the other methods (Kunik et al 2007).…”

Section: Specific Peptidesmentioning

confidence: 99%

Syntactic structures in languages and biology

Horn

2007

Cogn Process

Self Cite

View full text Add to dashboard Cite

Both natural languages and cell biology make use of one-dimensional encryption. Their investigation calls for syntactic deciphering of the text and semantic understanding of the resulting structures. Here we discuss recently published algorithms that allow for such searches: automatic distillation of structure (ADIOS) that is successful in discovering syntactic structures in linguistic texts and its motif extraction (MEX) component that can be used for uncovering motifs in DNA and protein sequences. The underlying principles of these syntactic algorithms and some of their results will be described.

show abstract

“…Based on MEX, Kunik et al [38] have developed a method to identify and classify enzymes based on Specific Peptides (SPs). The SPs are strings of amino acids, derived from enzyme sequences using MEX and showed that the coverage of the SPs is better than that of PROSITE motifs in finding the function of enzyme families.…”

Section: Methods By Feature Spacementioning

confidence: 99%

Computational Approaches for Automated Classification of Enzyme Sequences

Mohammed

Guda

2011

JPB

View full text Add to dashboard Cite

Determining the functional role(s) of enzymes is very important to build the metabolic blueprint of an organism and to identify the potential roles enzymes may play in metabolic and disease pathways. With exponential growth in gene and protein sequence data, it is not feasible to experimentally characterize the function(s) of all enzymes. Alternatively, computational methods can be used to annotate the enormous amount of unannotated enzyme sequences. For function prediction and classification of enzymes, features based on amino acid composition, sequence and structural properties, domain composition and specific peptide information have been widely used by different computational approaches. Each feature space has its own merits and limitations on the overall prediction accuracy. Prediction accuracy improves when machine-learning methods are used to classify enzymes. Given the incomplete and unbalanced nature of annotations in biological databases, ensemble methods or methods that bank on a combination of orthogonal feature are more desirable for achieving higher accuracy and coverage in enzyme classification. In this review article, we systematically describe all the features and methods used thus far for enzyme class prediction. To the authors’ knowledge, this review represents the most exhaustive description of methods used for computational prediction of enzyme classes.

show abstract

Functional Representation of Enzymes by Specific Peptides

Cited by 19 publications

References 32 publications

Data mining of enzymes using specific peptides

Data mining of enzymes using specific peptides

Syntactic structures in languages and biology

Computational Approaches for Automated Classification of Enzyme Sequences

Contact Info

Product

Resources

About