2005
DOI: 10.1002/prot.20605
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of transporter family from protein sequence by support vector machine approach

Abstract: Transporters play key roles in cellular transport and metabolic processes, and in facilitating drug delivery and excretion. These proteins are classified into families based on the transporter classification (TC) system. Determination of the TC family of transporters facilitates the study of their cellular and pharmacological functions. Methods for predicting TC family without sequence alignments or clustering are particularly useful for studying novel transporters whose function cannot be determined by sequen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
52
0

Year Published

2006
2006
2012
2012

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 59 publications
(53 citation statements)
references
References 48 publications
1
52
0
Order By: Relevance
“…Statistically, a substantial percentage of active members can be classified by ML methods as active even if its family representative is in the inactive class [57]. Therefore, in principle, a reasonably good ML model can be derived from these putative inactive samples, which has been confirmed by a number of studies [53][54][55]57].…”
Section: Generation Of Putative Inactive Compoundsmentioning
confidence: 98%
See 1 more Smart Citation
“…Statistically, a substantial percentage of active members can be classified by ML methods as active even if its family representative is in the inactive class [57]. Therefore, in principle, a reasonably good ML model can be derived from these putative inactive samples, which has been confirmed by a number of studies [53][54][55]57].…”
Section: Generation Of Putative Inactive Compoundsmentioning
confidence: 98%
“…Apart from the use of known inactive compounds and active compounds of other biological target classes as putative inactive compounds [7][8][9][11][12][13][14]40], a new approach extensively used for generating inactive proteins in ML classification of various classes of proteins [53][54][55] may be applied for generating putative inactive compounds. An advantage of this approach is its independence on the knowledge of known inactive compounds and active compounds of other biological target classes, which enables more expanded coverage of the ''inactive'' chemical space in cases of limited knowledge of inactive compounds and compounds of other biological classes.…”
Section: Generation Of Putative Inactive Compoundsmentioning
confidence: 99%
“…This may likely result from the availability of a more diverse set of negative data than that for the positive data, which enables the SVM to perform a better statistical learning for recognition of non-ARPs. Moreover, a SVM based on an imbalanced data set tends to produce feature vectors that push the hyperplane towards the side with a smaller number of data, which can lead to a reduced accuracy for the data set either with a smaller number of samples or of less diversity [30]. This may be another reason why the prediction accuracy for ARPs is generally lower than that of non-ARPs.…”
Section: Performance Of the Modelmentioning
confidence: 92%
“…The negative data set, representing non-ARPs, was selected by a commonly used procedure [29,30]. In this procedure, representative proteins of curated protein families in the Pfam database [31] that contain no single known ARPs are selected as non-ARPs.…”
Section: Selection Of Arps and Non-arpsmentioning
confidence: 99%
“…SVM is a new machine learning approach originally proposed and developed by Vapnik (18). SVM applications are being actively pursued in various areas recently, from face recognition to genomics (19). It is a powerful tool for analyzing complex data derived from SELDI-TOF MS. We constructed a non-linear SVM classifier with a radial based function (RBF) kernel, and with the parameter Gamma 0.6, being the cost of the constrain violation 19 to discriminate the different groups.…”
Section: Bioinformatics Analysismentioning
confidence: 99%