B-cell epitopes play a vital role in the development of peptide vaccines, in diagnosis of diseases, and also for allergy research. Experimental methods used for characterizing epitopes are time consuming and demand large resources. The availability of epitope prediction method(s) can rapidly aid experimenters in simplifying this problem. The standard feed-forward (FNN) and recurrent neural network (RNN) have been used in this study for predicting B-cell epitopes in an antigenic sequence. The networks have been trained and tested on a clean data set, which consists of 700 non-redundant B-cell epitopes obtained from Bcipep database and equal number of non-epitopes obtained randomly from Swiss-Prot database. The networks have been trained and tested at different input window length and hidden units. Maximum accuracy has been obtained using recurrent neural network (Jordan network) with a single hidden layer of 35 hidden units for window length of 16. The final network yields an overall prediction accuracy of 65.93% when tested by fivefold cross-validation. The corresponding sensitivity, specificity, and positive prediction values are 67.14, 64.71, and 65.61%, respectively. It has been observed that RNN (JE) was more successful than FNN in the prediction of B-cell epitopes. The length of the peptide is also important in the prediction of B-cell epitopes from antigenic sequences. The webserver ABCpred is freely available at www.imtech.res.in/raghava/abcpred/.
BackgroundOver the past few decades, scientific research has been focused on developing peptide/protein-based therapies to treat various diseases. With the several advantages over small molecules, including high specificity, high penetration, ease of manufacturing, peptides have emerged as promising therapeutic molecules against many diseases. However, one of the bottlenecks in peptide/protein-based therapy is their toxicity. Therefore, in the present study, we developed in silico models for predicting toxicity of peptides and proteins.DescriptionWe obtained toxic peptides having 35 or fewer residues from various databases for developing prediction models. Non-toxic or random peptides were obtained from SwissProt and TrEMBL. It was observed that certain residues like Cys, His, Asn, and Pro are abundant as well as preferred at various positions in toxic peptides. We developed models based on machine learning technique and quantitative matrix using various properties of peptides for predicting toxicity of peptides. The performance of dipeptide-based model in terms of accuracy was 94.50% with MCC 0.88. In addition, various motifs were extracted from the toxic peptides and this information was combined with dipeptide-based model for developing a hybrid model. In order to evaluate the over-optimization of the best model based on dipeptide composition, we evaluated its performance on independent datasets and achieved accuracy around 90%. Based on above study, a web server, ToxinPred has been developed, which would be helpful in predicting (i) toxicity or non-toxicity of peptides, (ii) minimum mutations in peptides for increasing or decreasing their toxicity, and (iii) toxic regions in proteins.ConclusionToxinPred is a unique in silico method of its kind, which will be useful in predicting toxicity of peptides/proteins. In addition, it will be useful in designing least toxic peptides and discovering toxic regions in proteins. We hope that the development of ToxinPred will provide momentum to peptide/protein-based drug discovery (http://crdd.osdd.net/raghava/toxinpred/).
In this study a systematic attempt has been made to integrate various approaches in order to predict allergenic proteins with high accuracy. The dataset used for testing and training consists of 578 allergens and 700 non-allergens obtained from A. K. Bjorklund, D. Soeria-Atmadja, A. Zorzet, U. Hammerling and M. G. Gustafsson (2005) Bioinformatics, 21, 39–50. First, we developed methods based on support vector machine using amino acid and dipeptide composition and achieved an accuracy of 85.02 and 84.00%, respectively. Second, a motif-based method has been developed using MEME/MAST software that achieved sensitivity of 93.94 with 33.34% specificity. Third, a database of known IgE epitopes was searched and this predicted allergenic proteins with 17.47% sensitivity at specificity of 98.14%. Fourth, we predicted allergenic proteins by performing BLAST search against allergen representative peptides. Finally hybrid approaches have been developed, which combine two or more than two approaches. The performance of all these algorithms has been evaluated on an independent dataset of 323 allergens and on 101 725 non-allergens obtained from Swiss-Prot. A web server AlgPred has been developed for the predicting allergenic proteins and for mapping IgE epitopes on allergenic proteins (). AlgPred is available at .
BackgroundThe generation of interferon-gamma (IFN-γ) by MHC class II activated CD4+ T helper cells play a substantial contribution in the control of infections such as caused by Mycobacterium tuberculosis. In the past, numerous methods have been developed for predicting MHC class II binders that can activate T-helper cells. Best of author’s knowledge, no method has been developed so far that can predict the type of cytokine will be secreted by these MHC Class II binders or T-helper epitopes. In this study, an attempt has been made to predict the IFN-γ inducing peptides. The main dataset used in this study contains 3705 IFN-γ inducing and 6728 non-IFN-γ inducing MHC class II binders. Another dataset called IFNgOnly contains 4483 IFN-γ inducing epitopes and 2160 epitopes that induce other cytokine except IFN-γ. In addition we have alternate dataset that contains IFN-γ inducing and equal number of random peptides.ResultsIt was observed that the peptide length, positional conservation of residues and amino acid composition affects IFN-γ inducing capabilities of these peptides. We identified the motifs in IFN-γ inducing binders/peptides using MERCI software. Our analysis indicates that IFN-γ inducing and non-inducing peptides can be discriminated using above features. We developed models for predicting IFN-γ inducing peptides using various approaches like machine learning technique, motifs-based search, and hybrid approach. Our best model based on the hybrid approach achieved maximum prediction accuracy of 82.10% with MCC of 0.62 on main dataset. We also developed hybrid model on IFNgOnly dataset and achieved maximum accuracy of 81.39% with 0.57 MCC.ConclusionBased on this study, we have developed a webserver for predicting i) IFN-γ inducing peptides, ii) virtual screening of peptide libraries and iii) identification of IFN-γ inducing regions in antigen (http://crdd.osdd.net/raghava/ifnepitope/).ReviewersThis article was reviewed by Prof Kurt Blaser, Prof Laurence Eisenlohr and Dr Manabu Sugai.
RNA-binding proteins (RBPs) play key roles in post-transcriptional control of gene expression, which, along with transcriptional regulation, is a major way to regulate patterns of gene expression during development. Thus, the identification and prediction of RNA binding sites is an important step in comprehensive understanding of how RBPs control organism development. Combining evolutionary information and support vector machine (SVM), we have developed an improved method for predicting RNA binding sites or RNA interacting residues in a protein sequence. The prediction models developed in this study have been trained and tested on 86 RNA binding protein chains and evaluated using fivefold cross validation technique. First, a SVM model was developed that achieved a maximum Matthew's correlation coefficient (MCC) of 0.31. The performance of this SVM model further improved the MCC from 0.31 to 0.45, when multiple sequence alignment in the form of PSSM profiles was used as input to the SVM, which is far better than the maximum MCC achieved by previous methods (0.41) on the same dataset. In addition, SVM models were also developed on an alternative dataset that contained 107 RBP chains. Utilizing PSSM as input information to the SVM, the training/testing on this alternate dataset achieved a maximum MCC of 0.32. Conclusively, the prediction performance of SVM models developed in this study is better than the existing methods on the same datasets. A web server 'Pprint' was also developed for predicting RNA binding residues in a protein sequence which is freely available at http://www.imtech.res.in/raghava/pprint/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.