Capsid proteins often present a positively charged arginine-rich region at the N and/or C-termini that for some icosahedral viruses has a fundamental role in genome packaging and particle stability. These sequences show little to no conservation at the amino-acid level and are structurally dynamic so that they cannot be easily detected by common sequence or structure comparison. As a result, the occurrence and distribution of positively charged protein domain across the viral and the overall protein universe are 2 unknown. We developed a methodology based on the net charge calculation of discrete segments of the protein sequence that allows us to identify proteins containing aminoacid stretches with an extremely high net charge. We observed that among all organisms, icosahedral viruses are especially enriched in extremely positively charged segments (Q ≥ +17), with a distinctive bias towards arginine instead of lysine. We used viral particle structural data to calculate the total electrostatic charge derived from the most positively charged protein segment of capsid proteins and correlated these values with genome charge arising from the phosphates of each nucleotide. We obtained a positive correlation (r = 0.91, p-value < 0001) for a group of 17 viral families, corresponding to 40% of all families with icosahedral structures described so far. These data indicated that unrelated viruses with diverse genome types adopt a common underlying mechanism for capsid assembly and genome stabilization based on R-arms.Outliers from a linear fit pointed to families with alternative strategies of capsid assembly and genome packaging.
Significance StatementViruses can be characterized by the existence of a capsid, an intricate proteinaceous container that encases the viral genome. Therefore, capsid assembly and function are essential to viral replication. Here we specify virus families with diverse capsid structure and sequence, for each capsid packing capacity depends on a distinctive structural feature: a highly positively charged segment of amino acids residues, preferentially made of arginine. We also show that proteins with the same characteristics are rarely found in cellular proteins. Therefore, we identified a conserved viral functional element that can be used to infer capsid assembly mechanisms and inspire the design of protein nanoparticles and broad-spectrum antiviral treatments.