Correspondence analysis of amino acid usage was applied to 14,815 complete proteins from the human genome. We found that three major factors influence the variability of amino acidic composition of these proteins, explaining, respectively 20.4%, 14.7%, and 9.9% of the total variability. The first trend is strongly correlated with the GC content of first and second codon positions and is also significantly correlated with the GC level of the corresponding flanking regions and introns. Therefore, the main force shaping amino acid usage among human proteins are the compositional constraints determined by the isochore in which each gene is embedded. The second trend correlates with the hydropathy of each protein and with the frequency of beta-strands. Finally, the third trend is strongly associated with the usage of Cys and the frequency of alpha-helices.
Recent investigations have shown that isochores are characterized by a 3-D structure which is primarily responsible for the topology of chromatin domains. More precisely, an analysis of human chromosome 21 demonstrated that low-heterogeneity, GC-poor isochores are characterized by the presence of oligo-Adenines that are intrinsically stiff, curved and unfavorable for nucleosome binding. This leads to a structure of the corresponding chromatin domains, the Lamina Associated Domains, or LADs, which is well suited for interaction with the lamina. In contrast, the high-heterogeneity GC-rich isochores are in the form of compositional peaks and valleys characterized by increasing gradients of oligo-Guanines in the peaks and oligo-Adenines in the valleys that lead to increasing nucleosome depletions in the corresponding chromatin domains, the Topological Associating Domains, or TADs. These results encouraged us to investigate in detail the di- and tri-nucleotide profiles of 100 Kb segments of chromosome 21, as well as those of the di- to octa-Adenines and di- to octa-Guanines in some representative regions of the chromosome. The results obtained show that the 3-D structures of isochores and chromatin domains depend not only upon oligo-Adenines and oligo-Guanines but also, to a lower but definite extent, upon the majority of di- and tri-nucleotides. This conclusion has strong implications for the biological role of non-coding sequences.
The compositional properties of the human genome have been extensively studied. These analyses focused mainly in isochores. With the availability of the human genome and several molecular techniques, new studies were performed, showing that nucleotide composition is related to three processes: gene expression, replication and recombination. Nevertheless, these studies usually focused on regions at the sub-chromosomal level. Here we study the compositional differences among chromosomes, considering structural and functional aspects using the chromosomes as the units of analysis. We show that: i) chromosomes are compositionally consistent units; ii) there exists a correlation between their GC content and size and location within the nucleus, and iii) the three processes mentioned above are linked to compositional properties at the chromosomal level. These results support the existence of a link between composition and spatial/structural/functional features of entire chromosomes. The Evolutionary mechanisms and forces underlying these patterns remain open questions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.