BackgroundOne of the major challenges in post-genomic era is to provide functional annotations for large number of proteins arising from genome sequencing projects. The function of many proteins depends on their interaction with small molecules or ligands. ATP is one such important ligand that plays critical role as a coenzyme in the functionality of many proteins. There is a need to develop method for identifying ATP interacting residues in a ATP binding proteins (ABPs), in order to understand mechanism of protein-ligands interaction.ResultsWe have compared the amino acid composition of ATP interacting and non-interacting regions of proteins and observed that certain residues are preferred for interaction with ATP. This study describes few models that have been developed for identifying ATP interacting residues in a protein. All these models were trained and tested on 168 non-redundant ABPs chains. First we have developed a Support Vector Machine (SVM) based model using primary sequence of proteins and obtained maximum MCC 0.33 with accuracy of 66.25%. Secondly, another SVM based model was developed using position specific scoring matrix (PSSM) generated by PSI-BLAST. The performance of this model was improved significantly (MCC 0.5) from the previous one, where only the primary sequence of the proteins were used.ConclusionThis study demonstrates that it is possible to predict 'ATP interacting residues' in a protein with moderate accuracy using its sequence. The evolutionary information is important for the identification of 'ATP interacting residues', as it provides more information compared to the primary sequence. This method will be useful for researchers studying ATP-binding proteins. Based on this study, a web server has been developed for predicting 'ATP interacting residues' in a protein http://www.imtech.res.in/raghava/atpint/.
BackgroundAntibacterial peptides are one of the effecter molecules of innate immune system. Over the last few decades several antibacterial peptides have successfully approved as drug by FDA, which has prompted an interest in these antibacterial peptides. In our recent study we analyzed 999 antibacterial peptides, which were collected from Antibacterial Peptide Database (APD). We have also developed methods to predict and classify these antibacterial peptides using Support Vector Machine (SVM).ResultsDuring analysis we observed that certain residues are preferred over other in antibacterial peptide, particularly at the N and C terminus. These observation and increased data of antibacterial peptide in APD encouraged us to again develop a new and more robust method for predicting antibacterial peptides in protein from their amino acid sequence or given peptide have antibacterial properties or not. First, the binary patterns of the 15 N terminus residues were used for predicting antibacterial peptide using SVM and achieved accuracy of 85.46% with 0.705 Mathew's Correlation Coefficient (MCC). Then we used the binary pattern of 15 C terminus residues and achieved accuracy of 85.05% with 0.701 MCC, latter on we developed prediction method by combining N & C terminus and achieved an accuracy of 91.64% with 0.831 MCC. Finally we developed SVM based model using amino acid composition of whole peptide and achieved 92.14% accuracy with MCC 0.843. In this study we used five-fold cross validation technique to develop all these models and tested the performance of these models on an independent dataset. We further classify antibacterial peptides according to their sources and achieved an overall accuracy of 98.95%. We further classify antibacterial peptides in their respective family and got a satisfactory result.ConclusionAmong antibacterial peptides, there is preference for certain residues at N and C terminus, which helps to discriminate them from non-antibacterial peptides. Amino acid composition of antibacterial peptides helps to demarcate them from non-antibacterial peptide and their further classification in source and family. Antibp2 will be helpful in discovering efficacious antibacterial peptide, which we hope will be helpful against antibiotics resistant bacteria. We also developed user friendly web server for the biological community.
BackgroundMembrane transport proteins (transporters) move hydrophilic substrates across hydrophobic membranes and play vital roles in most cellular functions. Transporters represent a diverse group of proteins that differ in topology, energy coupling mechanism, and substrate specificity as well as sequence similarity. Among the functional annotations of transporters, information about their transporting substrates is especially important. The experimental identification and characterization of transporters is currently costly and time-consuming. The development of robust bioinformatics-based methods for the prediction of membrane transport proteins and their substrate specificities is therefore an important and urgent task.ResultsSupport vector machine (SVM)-based computational models, which comprehensively utilize integrative protein sequence features such as amino acid composition, dipeptide composition, physico-chemical composition, biochemical composition, and position-specific scoring matrices (PSSM), were developed to predict the substrate specificity of seven transporter classes: amino acid, anion, cation, electron, protein/mRNA, sugar, and other transporters. An additional model to differentiate transporters from non-transporters was also developed. Among the developed models, the biochemical composition and PSSM hybrid model outperformed other models and achieved an overall average prediction accuracy of 76.69% with a Mathews correlation coefficient (MCC) of 0.49 and a receiver operating characteristic area under the curve (AUC) of 0.833 on our main dataset. This model also achieved an overall average prediction accuracy of 78.88% and MCC of 0.41 on an independent dataset.ConclusionsOur analyses suggest that evolutionary information (i.e., the PSSM) and the AAIndex are key features for the substrate specificity prediction of transport proteins. In comparison, similarity-based methods such as BLAST, PSI-BLAST, and hidden Markov models do not provide accurate predictions for the substrate specificity of membrane transport proteins. TrSSP: The Transporter Substrate Specificity Prediction Server, a web server that implements the SVM models developed in this paper, is freely available at http://bioinfo.noble.org/TrSSP.
Pancreatic cancer (PC) is the fourth leading cause of cancer deaths in the United States with a five-year patient survival rate of only 6%. Early detection and treatment of this disease is hampered due to lack of reliable diagnostic and prognostic markers. Recent studies have shown that dynamic changes in the global DNA methylation and gene expression patterns play key roles in the PC development; hence, provide valuable insights for better understanding the initiation and progression of PC. In the current study, we used DNA methylation, gene expression, copy number, mutational and clinical data from pancreatic patients. We independently investigated the DNA methylation and differential gene expression profiles between normal and tumor samples and correlated methylation levels with gene expression patterns. We observed a total of ~23-thousand differentially methylated CpG sites (Δβ≥0.1) between normal and tumor samples, where majority of the CpG sites are hypermethylated in PC, and this phenomenon is more prominent in the 5′UTRs and promoter regions compared to the gene bodies. Differential methylation is observed in genes associated with the homeobox domain, cell division and differentiation, cytoskeleton, epigenetic regulation and development, pancreatic development and pancreatic signaling and pancreatic cancer core signaling pathways. Correlation analysis suggests that methylation in the promoter region and 5′UTR has mostly negative correlations with gene expression while gene body and 3′UTR associated methylation has positive correlations. Regulatory element analysis suggests that HOX cluster and histone core proteins are upstream regulators of hypomethylation, while SMAD4, STAT4, STAT5B and zinc finger proteins (ZNF) are upstream regulators of hypermethylation. Non-negative matrix factorization (NMF) clustering of differentially methylated sites generated three clusters in PCs suggesting the existence of distinct molecular subtypes. Cluster 1 and cluster 2 showed samples enriched with clinical phenotypes like neoplasm histological grade and pathologic T-stage T3, respectively, while cluster 3 showed the enrichment of samples with neoplasm histological grade G1. To the best of our knowledge, this is the first genome-scale methylome analysis of PC data from TCGA. Our clustering analysis provides a strong basis for future work on the molecular subtyping of epigenetic regulation in pancreatic cancer.
BackgroundGuanosine triphosphate (GTP)-binding proteins play an important role in regulation of G-protein. Thus prediction of GTP interacting residues in a protein is one of the major challenges in the field of the computational biology. In this study, an attempt has been made to develop a computational method for predicting GTP interacting residues in a protein with high accuracy (Acc), precision (Prec) and recall (Rc).ResultAll the models developed in this study have been trained and tested on a non-redundant (40% similarity) dataset using five-fold cross-validation. Firstly, we have developed neural network based models using single sequence and PSSM profile and achieved maximum Matthews Correlation Coefficient (MCC) 0.24 (Acc 61.30%) and 0.39 (Acc 68.88%) respectively. Secondly, we have developed a support vector machine (SVM) based models using single sequence and PSSM profile and achieved maximum MCC 0.37 (Prec 0.73, Rc 0.57, Acc 67.98%) and 0.55 (Prec 0.80, Rc 0.73, Acc 77.17%) respectively. In this work, we have introduced a new concept of predicting GTP interacting dipeptide (two consecutive GTP interacting residues) and tripeptide (three consecutive GTP interacting residues) for the first time. We have developed SVM based model for predicting GTP interacting dipeptides using PSSM profile and achieved MCC 0.64 with precision 0.87, recall 0.74 and accuracy 81.37%. Similarly, SVM based model have been developed for predicting GTP interacting tripeptides using PSSM profile and achieved MCC 0.70 with precision 0.93, recall 0.73 and accuracy 83.98%.ConclusionThese results show that PSSM based method performs better than single sequence based method. The prediction models based on dipeptides or tripeptides are more accurate than the traditional model based on single residue. A web server "GTPBinder" http://www.imtech.res.in/raghava/gtpbinder/ based on above models has been developed for predicting GTP interacting residues in a protein.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.