ABSTRACTing gap between the number of known protein sequences and the number of known structures. Predicted relative solvent accessibility (RSA)Despite several decades of extensive research in terprovides useful information for prediction of tiary structure prediction, this task is still a big chalbinding sites and reconstruction of the 3D-lenge, especially for sequences that do not have a sigs t r u c t u r e b a s e d o n a p r o t e i n s e q u e n c e .nificant sequence similarity with known structures Recent years observed development of sev- [1]. As a result, the predictions of the solvent accessieral RSA prediction methods including those b i l i t y [ 2 ] a n d t h e s e c o n d a r y s t r u c t u r e [ 3 ] a r e that generate real values and those that preaddressed as an intermediate step towards the predicdict discrete states (buried vs. exposed). We tion of the tertiary structure. The relative solvent propose a novel method for real value predicaccessibility (RSA) reflects the degree to which a restion that aims at minimizing the prediction idue interacts with the solvent molecules. Since proerror when compared with six existing methtein-protein and protein-ligand interactions occur at ods. The proposed method is based on a twothe protein surface, only the residues that have a stage Support Vector Regression (SVR) prelarge surface area exposed to the solvent can possibly dictor. The improved prediction quality is a bind to the ligands and other proteins. As a result, preresult of the developed composite sequence diction of solvent accessibility provides useful inforrepresentation, which includes a custommation for prediction of binding sites [4] and is selected subset of features from the PSIvitally important for understanding the binding mech-BLAST profile, secondary structure preanism of proteins [5]. Chan and Dill pointed that the dicted with PSI-PRED, and binary code that burial of core residues is the driving force in protein indicates position of a given residue with folding, which suggests that knowledge of localizarespect to sequence termini. Cross validation of individual residues (surface vs. buried) protion tests on a benchmark dataset show that vides useful information to reconstruct the 3D-our method achieves 14.3 mean absolute structure of proteins [6][7][8]. error and 0.68 correlation. We also propose aThe existing solvent accessibility prediction methconfidence value that is associated with each ods use the protein sequence, which is converted into predicted RSA values. The confidence is com-a fixed-size feature-based representation, as an input puted based on the difference in predictions to predict the RSA for each of the residues. These from the two-stage SVR and a second two-methods can be divided into two main groups: stage Linear Regression (LR) predictor. TheReal valued predictors predict RSA value (the confidence values can be used to indicate definition is given in the Materials section
Predicted relative solvent accessibility (RSA) provides useful information for prediction of binding sites and reconstruction of the 3D-structure based on a protein sequence, which are at the very core of proteomics. Several RSA prediction methods including those that generate real values and those that predict discrete states (buried vs. exposed) have been published. We propose a novel method for real valued prediction that aims to improve the prediction quality when compared with the existing methods. The proposed method combines Support Vector Regression (SVR) predictors into a two-stage architecture. The improved prediction quality comes from a composite sequence representation, which includes a custom-selected subset of features from the PSI-BLAST profile, secondary structure predicted with PSI-PRED, and binary code that indicates position of a given residue with respect to sequence termini. Based on empirical evaluation with a standard benchmark dataset, the proposed method obtains the mean absolute error (MAE) equal 0.143, which corresponds to 6% error rate reduction when compared with the best performing competing method that obtains 0.152 MAE on this dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.