Structure prediction and computational protein design should benefit from accurate solvent models. We have applied implicit solvent models to two problems that are central to this area. First, we performed sidechain placement for 29 proteins, using a solvent model that combines a screened Coulomb term with an Accessible Surface Area term (CASA model). With optimized parameters, the prediction quality is comparable with earlier work that omitted electrostatics and solvation altogether. Second, we computed the stability changes associated with point mutations involving ionized sidechains. For over 1000 mutations, including many fully or partly buried positions, we compared CASA and two generalized Born models (GB) with a more accurate model, which solves the Poisson equation of continuum electrostatics numerically. CASA predicts the correct sign and order of magnitude of the stability change for 81% of the mutations, compared to 97% with the best GB. We also considered 140 mutations for which experimental data are available. Comparing to experiment requires additional assumptions about the unfolded protein structure, protein relaxation in response to the mutations, and contributions from the hydrophobic effect. With a simple, commonly-used unfolded state model, the mean unsigned error is 2.1 kcal/mol with both CASA and the best GB. Overall, the electrostatic model is not important for sidechain placement; CASA and GB are equivalent for surface mutations, while GB is far superior for fully or partly buried positions. Thus, for problems like protein design that involve all these aspects, the most recent GB models represent an important step forward. Along with the recent discovery of efficient, pairwise implementations of GB, this will open new possibilities for the computational engineering of proteins.
BackgroundThe genetic diversity observed among bacteriophages remains a major obstacle for the identification of homologs and the comparison of their functional modules. In the structural module, although several classes of homologous proteins contributing to the head and tail structure can be detected, proteins of the head-to-tail connection (or neck) are generally more divergent. Yet, molecular analyses of a few tailed phages belonging to different morphological classes suggested that only a limited number of structural solutions are used in order to produce a functional virion. To challenge this hypothesis and analyze proteins diversity at the virion neck, we developed a specific computational strategy to cope with sequence divergence in phage proteins. We searched for homologs of a set of proteins encoded in the structural module using a phage learning database.ResultsWe show that using a combination of iterative profile-profile comparison and gene context analyses, we can identify a set of head, neck and tail proteins in most tailed bacteriophages of our database. Classification of phages based on neck protein sequences delineates 4 Types corresponding to known morphological subfamilies. Further analysis of the most abundant Type 1 yields 10 Clusters characterized by consistent sets of head, neck and tail proteins. We developed Virfam, a webserver that automatically identifies proteins of the phage head-neck-tail module and assign phages to the most closely related cluster of phages. This server was tested against 624 new phages from the NCBI database. 93% of the tailed and unclassified phages could be assigned to our head-neck-tail based categories, thus highlighting the large representativeness of the identified virion architectures. Types and Clusters delineate consistent subgroups of Caudovirales, which correlate with several virion properties.ConclusionsOur method and webserver have the capacity to automatically classify most tailed phages, detect their structural module, assign a function to a set of their head, neck and tail genes, provide their morphologic subtype and localize these phages within a “head-neck-tail” based classification. It should enable analysis of large sets of phage genomes. In particular, it should contribute to the classification of the abundant unknown viruses found on assembled contigs of metagenomic samples.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-1027) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.