1Proteins with low-complexity domains continue to emerge as key players in both normal 2 and pathological cellular processes. Although low-complexity domains are often grouped into a 3 single class, individual low-complexity domains can differ substantially with respect to amino 4 acid composition. These differences may strongly influence the physical properties, cellular 5 regulation, and molecular functions of low-complexity domains. Therefore, we developed a 6 bioinformatic approach to explore relationships between amino acid composition, protein 7 metabolism, and protein function. We find that local compositional enrichment within protein 8 sequences affects the translation efficiency, abundance, half-life, subcellular localization, and 9 molecular functions of proteins on a proteome-wide scale. However, these effects depend upon 10 the type of amino acid enriched in a given sequence, highlighting the importance of 11 distinguishing between different types of low-complexity domains. Furthermore, many of these 12 effects are discernible at amino acid compositions below those required for classification as low-13 complexity or statistically-biased by traditional methods and in the absence of homopolymeric 14 amino acid repeats, indicating that thresholds employed by classical methods may not reflect 15 biologically relevant criteria. Application of our analyses to composition-driven processes, such 16 as the formation of membraneless organelles, reveals distinct composition profiles even for 17 closely related organelles. Collectively, these results provide a unique perspective and detailed 18 insights into relationships between amino acid composition, protein metabolism, and protein 19 functions. 20 21
Author Summary 22Low-complexity domains in protein sequences are regions that are composed of only a 23 few amino acids in the protein "alphabet". These domains often have unique chemical 24properties and play important biological roles in both normal and disease-related processes. 25 While a number of approaches have been developed to define low-complexity domains, these 26 methods each possess conceptual limitations. Therefore, we developed a complementary 27 approach that focuses on local amino acid composition (i.e. the amino acid composition within 28 small regions of proteins). We find that high local composition of individual amino acids is 29 associated with pervasive effects on protein metabolism, subcellular localization, and molecular 30 function on a proteome-wide scale. Importantly, the nature of the effects depend on the type of 31 amino acid enriched within the examined domains, and are observable in the absence of 32 classically-defined low-complexity (and related) domains. Furthermore, we define the 33 compositions of proteins involved in the formation of membraneless, protein-rich organelles 34 such as stress granules and P-bodies. Our results provide a coherent view and unprecedented 35 resolution of the effects of local amino acid enrichment on protein biology. 36 37 the local Shannon entropy...