Most modern tools used to predict sites of small ubiquitin-like modifier (SUMO) binding (referred to as SUMOylation) use algorithms, chemical features of the protein, and consensus motifs. However, these tools rarely consider the influence of post-translational modification (PTM) information for other sites within the same protein on the accuracy of prediction results. This study applied the Random Forest machine learning method, as well as motif screening models and a feature selection combination mechanism, to develop a SUMOylation prediction system, referred to as SUMOgo. With regard to prediction method, PTM sites were coded as new functional features in addition to structural features, such as sequence-based binary coding, encoded chemical features of proteins, and encoded secondary structure information that is important for PTM. Twenty cycles of prediction were conducted with a 1:1 combination of positive test data and random negative data. Matthew’s correlation coefficient of SUMOgo reached 0.511, which is higher than that of current commonly used tools. This study further verified the important role of PTM in SUMOgo and includes a case study on CREB binding protein (CREBBP). The website for the final tool is http://predictor.nchu.edu.tw/SUMOgo.
Drug development and investigation of protein function both require an understanding of protein subcellular localization. We developed a system, REALoc, that can predict the subcellular localization of singleplex and multiplex proteins in humans. This system, based on comprehensive strategy, consists of two heterogeneous systematic frameworks that integrate one-to-one and many-to-many machine learning methods and use sequence-based features, including amino acid composition, surface accessibility, weighted sign aa index, and sequence similarity profile, as well as gene ontology function-based features. REALoc can be used to predict localization to six subcellular compartments (cell membrane, cytoplasm, endoplasmic reticulum/Golgi, mitochondrion, nucleus, and extracellular). REALoc yielded a 75.3% absolute true success rate during five-fold cross-validation and a 57.1% absolute true success rate in an independent database test, which was >10% higher than six other prediction systems. Lastly, we analyzed the effects of Vote and GANN models on singleplex and multiplex localization prediction efficacy. REALoc is freely available at http://predictor.nchu.edu.tw/REALoc.
Three novel naphthalimide‐based derivatives were synthesized and tested in vitro as anticancer agents. Our previous report of the C4‐benzazole 1,8‐naphthalimide derivatives showed good inhibition against murine melanoma. We aimed to synthesize more potent agents and found that compound 5 reported in this article behaved 5‐ to 10‐fold potency than our previous best results. The unique structure of compound 5 consisted of a naphthalimide framework in which C4 position was linked with an ethylenediamine group where the amino group was coupled with a 2‐piconic acid moiety. Compound 5 exhibited the most potent inhibitory activity toward human DNA topoisomerase II proteins with IC50 value (2.6 ± 0.1 μM) against murine B16F10 melanoma cells among the three target compounds synthesized in this study. In accordance with this finding, the results of molecular docking also revealed that compound 5 has the highest affinity with human DNA topoisomerase II among the selected compounds. Compound 5, therefore, has high potential for becoming a lead compound.
Abstract-Dueto the high-throughput of mass spectrometry-based phosphoproteomics experiment, the desire to annotate the catalytic kinases for in vivo phosphorylation sites has motivated. Many researches are undertaken to develop a computational method for the identification of kinase-specific phosphorylation sites using linear amino acid sequences. With an increasing interest in the structural environment of protein phosphorylation sites, herein, a new scheme has been developed for identifying kinase-specific phosphorylation sites on protein three-dimensional (3D) structures. For a large-scale investigation on 3D structures, all of the experimental phosphorylation sites are mapped to the protein entries of Protein Data Bank by sequence identity. In this work, a support vector machine (SVM) is applied to generate the predictive model learned from the information of spatial amino acid composition and structural alphabet. After the cross-validation evaluation, most of the kinase-specific models trained with the consideration of structural information outperform the models considering only the sequence information. Moreover, the independent testing set which is not included in training set has demonstrated that the proposed method could provide a stable performance. This study has demonstrated that the consideration of spatial context could improve the predictive performance compared to the model only considering the local sequence motifs. IndexTerms-Phosphorylation, protein kinase, three-dimensional structure, structural alphabet, spatial amino acid composition.
Upon invasion by foreign pathogens, specific antibodies can identify specific foreign antigens and disable them. As a result of this ability, antibodies can help with vaccine production and food allergen detection in patients. Many studies have focused on predicting linear B-cell epitopes, but only two prediction tools are currently available to predict the sub-type of an epitope. NIgPred was developed as a prediction tool for IgA, IgE, and IgG. NIgPred integrates various heterologous features with machine-learning approaches. Differently from previous studies, our study considered peptide-characteristic correlation and autocorrelation features. Sixty kinds of classifier were applied to construct the best prediction model. Furthermore, the genetic algorithm and hill-climbing algorithm were used to select the most suitable features for improving the accuracy and reducing the time complexity of the training model. NIgPred was found to be superior to the currently available tools for predicting IgE epitopes and IgG epitopes on independent test sets. Moreover, NIgPred achieved a prediction accuracy of 100% for the IgG epitopes of a coronavirus data set. NIgPred is publicly available at our website.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.