Extended proteins such as calmodulin and troponin C have two globular terminal domains linked by a central region that is exposed to water and often acts as a function-regulating element. The mechanisms that stabilize the tertiary structure of extended proteins appear to differ greatly from those of globular proteins. Identifying such differences in physical properties of amino acid sequences between extended proteins and globular proteins can provide clues useful for identification of extended proteins from complete genomes including orphan sequences. In the present study, we examined the structure and amino acid sequence of extended proteins. We found that extended proteins have a large net electric charge, high charge density, and an even balance of charge between the terminal domains, indicating that electrostatic interaction is a dominant factor in stabilization of extended proteins. Additionally, the central domain exposed to water contained many amphiphilic residues. Extended proteins can be identified from these physical properties of the tertiary structure, which can be deduced from the amino acid sequence. Analysis of physical properties of amino acid sequences can provide clues to the mechanism of protein folding. Also, structural changes in extended proteins may be caused by formation of molecular complexes. Long-range effects of electrostatic interactions also appear to play important roles in structural changes of extended proteins.Keywords: structural classification; extended protein; bioinformatics; structural genomics; mechanism of structural stabilization; physical properties of amino acid residues Complete genomes include many orphan amino acid sequences, the functions and structures of which are unknown. Determination of the tertiary structure of the proteins corresponding to these sequences is important for elucidation of their function, because the structure of a protein is closely related to its function. However, some types of proteins with unknown structures, such as nonglobular proteins containing flexible extended segments, are difficult to crystallize. Nonglobular soluble proteins with flexible segments are often involved in regulatory and cell-signaling functions (Wright and Dyson 1999;Ward et al. 2004). For example, calmodulin and troponin C appear to be nonglobular soluble extended proteins.Such extended proteins provide very interesting problems involving structure and changes in structure. First, extended proteins lack some physical properties of globular proteins, and vice versa. The structure of single extended protein molecules, as exemplified by calmodulin and troponin C, consists of separate domains near each terminal linked by a central segment exposed to water (Babu et al. 1988;Houdusse et al. 1997;Chou et al. 2001). In contrast, globular proteins are stabilized by a hydrophobic core (Kauzmann 1959).Second, extended proteins often contain a flexible segment, which allows changes in their structure to occur. For example, the central part of the region linking the te...
The structures and physical properties of individual protein molecules have been extensively studied, but the general features of all proteins in a cell have hardly been investigated. The distribution of net electric charges of all proteins from the Saccharomyces cerevisiae proteome agreed well with a Gaussian distribution. The shift in charge distribution caused by protonation of histidine suggested that the proteins in a cell are buffered against pH changes. A comparison between the amino acid sequences from the proteome and randomly generated sequences indicated that electric charges in the real sequences are clustered. Analysis of autocorrelation function of charged residues in the total proteome of S. cerevisiae showed a positive correlation of net charges in amino acid sequences with characteristic length as long as 81 residues, leading to the conclusion that the interactions within proteins is repulsive on average.
The distribution of net electric charge of amino acid sequences from Drosophila melanogaster is compared with a Gaussian distribution to investigate the balance between randomness and selection in the process of evolution. The net electric charge follows a Gaussian-like distribution, with a slight but systematic deviation from the Gaussian distribution. This deviation is not observed for eleven subsets of proteins of similar size, and it is shown that the mean and variance of the Gaussian distribution appear to be linearly dependent on the size of proteins. The Gaussian distribution is centered around a charge density of approximately one positive charge per 100 residues, which in comparison to the real distribution for random sequences, reveals some degree of charge correlation in the proteome of D. melanogaster. These findings suggest the possible involvement of a systematic selection mechanism in the evolution process.
The numbers of membrane proteins in the current genomes of various organisms provide an important clue about how the protein world has evolved from the aspect of membrane proteins. Numbers of membrane proteins were estimated by analyzing the total proteomes of 248 prokaryota, using the SOSUI system for membrane proteins (Hirokawa et al., Bioinformatics, 1998) and SOSUI-signal for signal peptides (Gomi et al., CBIJ, 2004). The results showed that the ratio of membrane proteins to total proteins in these proteomes was almost constant: 0.228. When amino acid sequences were randomized, setting the probability of occurrence of all amino acids to 5%, the membrane protein/total protein ratio decreased to about 0.085. However, when the same simulation was carried out, but using the amino acid composition of the above proteomes, this ratio was 0.218, which is nearly the same as that of the real proteomic systems. This fact is consistent with the birth, death and innovation (BDI) model for membrane proteins, in which transmembrane segments emerge and disappear in accordance with random mutation events.
Proteins with a charge periodicity of 28 residues (PCP28) were found recently in the human proteome, and many of the annotated PCP28 were located in the nucleus (Ke et al., Jpn. J. Appl. Phys. 2007). The physical properties of the amino acid sequences were analyzed to detect the difference in the physicochemistry between the nuclear and cytoplasmic PCP28 and develop a software system to classify the two types of PCP28. A significant difference in the global parameters from the entire sequence and the local parameters around a segment with the highest positive charge density was found between the nuclear and cytoplasmic PCP28. The global classification score included the densities of proline and cysteine, and the negative charge density, while the local score included the symmetry of the charge distribution, the density of cysteine, and the positive charge density. A prediction system was developed using the global and local scores, which possessed a sensitivity and specificity of 92% and 88%, respectively. The mechanism of translocation of proteins to the nucleus is discussed using the parameters relevant to the predictive system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.