Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a worldwide crisis with profound effects on both public health and the economy. In order to combat the COVID-19 pandemic, research groups have shared viral genome sequence data through the Global Initiative on Sharing All Influenza Data (GISAID). Over the past year, ≈290,000 full SARS-CoV-2 proteome sequences have been deposited in the GISAID. Here, we used these sequences to assess the rate of nonsynonymous mutants over the entire viral proteome. Our analysis shows that SARS-CoV-2 proteins are mutating at substantially different rates, with most of the viral proteins exhibiting little mutational variability. As anticipated, our calculations capture previously reported mutations that arose in the first months of the pandemic, such as D614G (Spike), P323L (NSP12), and R203K/G204R (Nucleocapsid), but they also identify more recent mutations, such as A222V and L18F (Spike) and A220V (Nucleocapsid), among others. Our comprehensive temporal and geographical analyses show two distinct periods with different proteome mutation rates: December 2019 to July 2020 and August to December 2020. Notably, some mutation rates differ by geography, primarily during the latter half of 2020 in Europe. Furthermore, our structure-based molecular analysis provides an exhaustive assessment of SARS-CoV-2 mutation rates in the context of the current set of 3D structures available for SARS-CoV-2 proteins. This emerging sequence-to-structure insight is beginning to illuminate the site-specific mutational (in)tolerance of SARS-CoV-2 proteins as the virus continues to spread around the globe.
SARS-CoV-2 coronavirus has caused a world-wide crisis with profound effects on both healthcare and the economy. In order to combat the COVID-19 pandemic, research groups have shared viral genome sequence data through the GISAID initiative. We collected and computationally profiled ∼223,000 full SARS-CoV-2 proteome sequences from GISAID over one year for emergent nonsynonymous mutations. Our analysis shows that SARS-CoV-2 proteins are mutating at substantially different rates, with most viral proteins exhibiting little mutational variability. As anticipated, our calculations capture previously reported mutations occurred in the first period of the pandemic, such as D614G (Spike), P323L (NSP12), and R203K/G204R (Nucleocapsid), but also identify recent mutations like A222V and L18F (Spike) and A220V (Nucleocapsid). Our comprehensive temporal and geographical analyses show two periods with different mutations in the SARS-CoV-2 proteome: December 2019 to June 2020 and July to November 2020. Some mutation rates differ also by geography; the main mutations in the second period occurred in Europe. Furthermore, our structure-based molecular analysis provides an exhaustive assessment of mutations in the context of 3D protein structure. Emerging sequence-to-structure data is beginning to reveal the site-specific mutational tolerance of SARS-CoV2 proteins as the virus continues to spread around the globe.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.