Our method (CRpred) uses sequence-based features and the sequence-derived PSI-BLAST profile. We used feature selection to reduce the dimensionality of the input (and explain the input) to support vector machine (SVM) classifier that provides predictions. Tests on eight datasets and side-by-side comparison with six modern structure- and sequence-based predictors show that CRpred provides predictions with quality comparable to current structure-based methods and better than sequence-based methods. The proposed method obtains 15-19% precision and 48-58% TP (true positive) rate, depending on the dataset used. CRpred also provides confidence values that allow selecting a subset of predictions with higher precision. The improved quality is due to newly designed features and careful parameterization of the SVM. The features incorporate amino acids characterized by the highest and the lowest propensities to constitute catalytic residues, Gly that provides flexibility for catalytic sites and sequence motifs characteristic to certain catalytic reactions. Our features indicate that catalytic residues are on average more conserved when compared with the general population of residues and that highly conserved amino acids characterized by high catalytic propensity are likely to form catalytic sites. We also show that local (with respect to the sequence) hydrophobicity contributes towards the prediction.
We investigate the relationship between the flexibility, expressed with B-factor, and the relative solvent accessibility (RSA) in the context of local, with respect to the sequence, neighborhood and related concepts such as residue depth. We observe that the flexibility of a given residue is strongly influenced by the solvent accessibility of the adjacent neighbors. The mean normalized B-factor of the exposed residues with two buried neighbors is smaller than that of the buried residues with two exposed neighbors. Inclusion of RSA of the neighboring residues (local RSA) significantly increases correlation with the B-factor. Correlation between the local RSA and B-factor is shown to be stronger than the correlation that considers local distance- or volume-based residue depth. We also found that the correlation coefficients between B-factor and RSA for the 20 amino acids, called flexibility-exposure correlation index, are strongly correlated with the stability scale that characterizes the average contributions of each amino acid to the folding stability. Our results reveal that the predicted RSA could be used to distinguish between the disordered and ordered residues and that the inclusion of local predicted RSA values helps providing a better contrast between these two types of residues. Prediction models developed based on local actual RSA and local predicted RSA show similar or better results in the context of B-factor and disorder predictions when compared with several existing approaches. We validate our models using three case studies, which show that this work provides useful clues for deciphering the structure-flexibility-function relation.
Structural class categorizes proteins based on the amount and arrangement of the constituent secondary structures. The knowledge of structural classes is applied in numerous important predictive tasks that address structural and functional features of proteins. We propose novel structural class assignment methods that use one-dimensional (1D) secondary structure as the input. The methods are designed based on a large set of low-identity sequences for which secondary structure is predicted from their sequence (PSSA(sc) model) or assigned based on their tertiary structure (SSA(sc)). The secondary structure is encoded using a comprehensive set of features describing count, content, and size of secondary structure segments, which are fed into a small decision tree that uses ten features to perform the assignment. The proposed models were compared against seven secondary structure-based and ten sequence-based structural class predictors. Using the 1D secondary structure, SSA(sc) and PSSA(sc) can assign proteins to the four main structural classes, while the existing secondary structure-based assignment methods can predict only three classes. Empirical evaluation shows that the proposed models are quite promising. Using the structure-based assignment performed in SCOP (structural classification of proteins) as the golden standard, the accuracy of SSA(sc) and PSSA(sc) equals 76 and 75%, respectively. We show that the use of the secondary structure predicted from the sequence as an input does not have a detrimental effect on the quality of structural class assignment when compared with using secondary structure derived from tertiary structure. Therefore, PSSA(sc) can be used to perform the automated assignment of structural classes based on the sequences.
Background: Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design.
Identification and prediction of RNA-binding residues (RBRs) provides valuable insights into the mechanisms of protein-RNA interactions. We analyzed the contributions of a wide range of factors including amino acid sequence, evolutionary conservation, secondary structure and solvent accessibility, to the prediction/characterization of RBRs. Five feature sets were designed and feature selection was performed to find and investigate relevant features. We demonstrate that (1) interactions with positively charged amino acids Arg and Lys are preferred by the egatively charged nucleotides; (2) Gly provides flexibility for the RNA binding sites; (3) Glu with negatively charged side chain and several hydrophobic residues such as Leu, Val, Ala and Phe are disfavored in the RNA-binding sites; (4) coil residues, especially in long segments, are more flexible (than other secondary structures) and more likely to interact with RNA; (5) helical residues are more rigid and consequently they are less likely to bind RNA; and (6) residues partially exposed to the solvent are more likely to form RNA-binding sites. We introduce a novel sequence-based predictor of RBRs, RBRpred, which utilizes the selected features. RBRpred is comprehensively tested on three datasets with varied atom distance cutoffs by performing both five-fold cross validation and jackknife tests and achieves Matthew's correlation coefficient (MCC) of 0.51, 0.48 and 0.42, respectively. The quality is comparable to or better than that for state-of-the-art predictors that apply the distancebased cutoff definition. We show that the most important factor for RBRs prediction is evolutionary conservation, followed by the amino acid sequence, predicted secondary structure and predicted solvent accessibility. We also investigate the impact of using native vs. predicted secondary structure and solvent accessibility. The predictions are sufficient for the RBR prediction and the knowledge of the actual solvent accessibility helps in predictions for lower distance cutoffs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.