BackgroundThe number of available structures of large multi-protein assemblies is quite small. Such structures provide phenomenal insights on the organization, mechanism of formation and functional properties of the assembly. Hence detailed analysis of such structures is highly rewarding. However, the common problem in such analyses is the low resolution of these structures. In the recent times a number of attempts that combine low resolution cryo-EM data with higher resolution structures determined using X-ray analysis or NMR or generated using comparative modeling have been reported. Even in such attempts the best result one arrives at is the very course idea about the assembly structure in terms of trace of the Cα atoms which are modeled with modest accuracy.Methodology/Principal FindingsIn this paper first we present an objective approach to identify potentially solvent exposed and buried residues solely from the position of Cα atoms and amino acid sequence using residue type-dependent thresholds for accessible surface areas of Cα. We extend the method further to recognize potential protein-protein interface residues.Conclusion/ SignificanceOur approach to identify buried and exposed residues solely from the positions of Cα atoms resulted in an accuracy of 84%, sensitivity of 83–89% and specificity of 67–94% while recognition of interfacial residues corresponded to an accuracy of 94%, sensitivity of 70–96% and specificity of 58–94%. Interestingly, detailed analysis of cases of mismatch between recognition of interface residues from Cα positions and all-atom models suggested that, recognition of interfacial residues using Cα atoms only correspond better with intuitive notion of what is an interfacial residue. Our method should be useful in the objective analysis of structures of protein assemblies when positions of only Cα positions are available as, for example, in the cases of integration of cryo-EM data and high resolution structures of the components of the assembly.
Expression of synthetic proteins from intergenic regions of and their functional association was recently demonstrated (Dhar et al. in J Biol Eng 3:2, 2009. doi:10.1186/1754-1611-3-2). This gave birth to the question: if one can make 'user-defined' genes from non-coding genome-how big is the artificially translatable genome? (Dinger et al. in PLoS Comput Biol 4, 2008; Frith et al. in RNA Biol 3(1):40-48, 2006a; Frith et al. in PLoS Genet 2(4):e52, 2006b). To answer this question, we performed a bioinformatics study of all reported intergenic sequences, in search of novel peptides and proteins, unexpressed by nature. Overall, 2500 intergenic sequences were computationally translated into 'protein sequence equivalents' and matched against all known proteins. Sequences that did not show any resemblance were used for building a comprehensive profile in terms of their structure, function, localization, interactions, stability so on. A total of 362 protein sequences showed evidence of stable tertiary conformations encoded by the intergenic sequences of genome. Experimental studies are underway to confirm some of the key predictions. This study points to a vast untapped repository of functional molecules lying undiscovered in the non-expressed genome of various organisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.