The COVID-19 pandemic has been characterised by sequential variant-specific waves shaped by viral, individual human and population factors. SARS-CoV-2 variants are defined by their unique combinations of mutations and there has been a clear adaptation to human infection since its emergence in 2019. Here we use machine learning models to identify shared signatures, i.e., common underlying mutational processes, and link these to the subset of mutations that define the variants of concern (VOCs). First, we examined the global SARS-CoV-2 genomes and associated metadata to determine how viral properties and public health measures have influenced the magnitude of waves, as measured by the number of infection cases, in different geographic locations using regression models. This analysis showed that, as expected, both public health measures and not virus properties alone are associated with the rise and fall of regional SARS-CoV-2 reported infection numbers. This impact varies geographically. We attribute this to intrinsic differences such as vaccine coverage, testing and sequencing capacity, and the effectiveness of government stringency. In terms of underlying evolutionary change, we used non-negative matrix factorisation to observe three distinct mutational signatures, unique in their substitution patterns and exposures from the SARS-CoV-2 genomes. Signatures 0, 1 and 3 were biased to C->T, T->C/A->G and G->T point mutations as would be expected of host antiviral molecules APOBEC, ADAR and ROS effects, respectively. We also observe a shift amidst the pandemic in relative mutational signature activity from predominantly APOBEC-like changes to an increasingly high proportion of changes consistent with ADAR editing. This could represent changes in how the virus and the host immune response interact, and indicates how SARS-CoV-2 may continue to accumulate mutations in the future. Linkage of the detected mutational signatures to the VOC defining amino acids substitutions indicates the majority of SARS-CoV-2's evolutionary capacity is likely to be associated with the action of host antiviral molecules rather than virus replication errors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.