2013
DOI: 10.1371/journal.pone.0072368
|View full text |Cite
|
Sign up to set email alerts
|

SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs

Abstract: Existing methods for predicting protein crystallization obtain high accuracy using various types of complemented features and complex ensemble classifiers, such as support vector machine (SVM) and Random Forest classifiers. It is desirable to develop a simple and easily interpretable prediction method with informative sequence features to provide insights into protein crystallization. This study proposes an ensemble method, SCMCRYS, to predict protein crystallization, for which each classifier is built by usin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
124
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 83 publications
(124 citation statements)
references
References 45 publications
0
124
0
Order By: Relevance
“…The analysis of feature importance can provide valuable information for predicting its function and activity. Previously, AAC has been used for analyzing the inherent characteristics and patterns of many therapeutic peptides [36][37][38][39][40][41] and protein functions [42][43][44]. In this study, the mean decrease of Gini index (MDGI) was utilized to rank the importance of each AAC feature.…”
Section: Composition Analysismentioning
confidence: 99%
“…The analysis of feature importance can provide valuable information for predicting its function and activity. Previously, AAC has been used for analyzing the inherent characteristics and patterns of many therapeutic peptides [36][37][38][39][40][41] and protein functions [42][43][44]. In this study, the mean decrease of Gini index (MDGI) was utilized to rank the importance of each AAC feature.…”
Section: Composition Analysismentioning
confidence: 99%
“…The original SCM algorithm was first proposed by Huang et al [10] and was consequently applied to discriminate and analyze proteins with various functions [8–10, 13, 14] based on their sequence information. To train the classifier, two FASTA files are expected as the input: one for the positive training data and one for the negative training data.…”
Section: Methodsmentioning
confidence: 99%
“…IGA computes a fitness function, where the area under the ROC curve (AUC) [15], and the Pearson’s correlation coefficient (R-value) between the initial and the optimized propensity scores of 20 amino acids are linearly combined. The weights for the AUC and R value were set based on previous studies [8–10]. (See Eq.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In total 428 protein properties were considered, including pI, average hydrophobicity and the frequency of neighboring pairs of amino acids. The accuracy for this method is between 75% and 81% and MCC ranges from 0.50 to 0.63.SCMCRYS is an ensemble method that creates “scoring cards” based on the identity of amino acid pairs separated by 0–9 residues as set of features [57]. The amino acid pairs are associated with a score derived from the frequencies of the pair in crystallizable and non-crystallizable proteins in the training set.…”
Section: Analysis Of Successful Crystallization Conditionsmentioning
confidence: 99%