Apoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. Based on the idea of coarse-grained description and grouping in physics, a new feature extraction method with grouped weight for protein sequence is presented, and applied to apoptosis protein subcellular localization prediction associated with support vector machine. For the same training dataset and the same predictive algorithm, the overall prediction accuracy of our method in Jackknife test is 13.2% and 15.3% higher than the accuracy based on the amino acid composition and instability index. Especially for the else class apoptosis proteins, the increment of prediction accuracy is 41.7 and 33.3 percentile, respectively. The experiment results show that the new feature extraction method is efficient to extract the structure information implicated in protein sequence and the method has reached a satisfied performance despite its simplicity. The overall prediction accuracy of EBGW_SVM model on dataset ZD98 reach 92.9% in Jackknife test, which is 8.2-20.4 percentile higher than other existing models. For a new dataset ZW225, the overall prediction accuracy of EBGW_SVM achieves 83.1%. Those implied that EBGW_SVM model is a simple but efficient prediction model for apoptosis protein subcellular location prediction.
Protein p
K
a
prediction is essential
for the investigation of the pH-associated relationship between protein
structure and function. In this work, we introduce a deep learning-based
protein p
K
a
predictor DeepKa, which is
trained and validated with the p
K
a
values
derived from continuous constant-pH molecular dynamics (CpHMD) simulations
of 279 soluble proteins. Here, the CpHMD implemented in the Amber
molecular dynamics package has been employed (
Huang
Y.
Huang
Y.
29949356
J. Chem. Inf. Model.
2018
58
1372
1383
). Notably, to avoid discontinuities at the boundary,
grid charges are proposed to represent protein electrostatics. We
show that the prediction accuracy by DeepKa is close to that by CpHMD
benchmarking simulations, validating DeepKa as an efficient protein
p
K
a
predictor. In addition, the training
and validation sets created in this study can be applied to the development
of machine learning-based protein p
K
a
predictors
in the future. Finally, the grid charge representation is general
and applicable to other topics, such as the protein–ligand
binding affinity prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.