Phylogenomic early warning signals for SARS-CoV-2 epidemic waves

Drake, Kieran O.; Boyd, Olivia; Franceschi, Vinicius B.; Colquhoun, Rachel M.; Ellaby, Nicholas A.F.; Volz, Erik M.

doi:10.1016/j.ebiom.2023.104939

eBioMedicine

2024

DOI: 10.1016/j.ebiom.2023.104939

|View full text |Cite

Phylogenomic early warning signals for SARS-CoV-2 epidemic waves

Kieran O. Drake,

Olivia Boyd,

Vinicius B. Franceschi

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article3

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods

Cahuantzi,

Lythgoe,

Hall

et al. 2024

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the “gold standard” for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.

show abstract

Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods

Cahuantzi,

Lythgoe,

Hall

et al. 2024

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

A phylogenetics and variant calling pipeline to support SARS-CoV-2 genomic epidemiology in the UK

Colquhoun,

O’Toole,

Hill

et al. 2024

Virus Evolution

View full text Add to dashboard Cite

In response to the escalating SARS-CoV-2 pandemic, in March 2020 the COVID-19 Genomics UK (COG-UK) consortium was established to enable national-scale genomic surveillance in the United Kingdom. By the end of 2020, 49% of all SARS-CoV-2 genome sequences globally had been generated as part of the COG-UK programme and to date this system has generated more than 3 million SARS-CoV-2 genomes. Rapidly and reliably analysing this unprecedented number of genomes was an enormous challenge. To fulfil this need and to inform public health decision making, we developed a centralised pipeline that performs quality control, alignment and variant calling, and provides the global phylogenetic context of sequences. We present this pipeline and describe how we tailored it as the pandemic progressed to scale with the increasing amounts of data and to provide the most relevant analyses on a daily basis.

show abstract

Phylogenetic signatures reveal multilevel selection and fitness costs in SARS-CoV-2

Bonetti Franceschi,

Volz

2024

Wellcome Open Res

View full text Add to dashboard Cite

Background Large-scale sequencing of SARS-CoV-2 has enabled the study of viral evolution during the COVID-19 pandemic. Some viral mutations may be advantageous to viral replication within hosts but detrimental to transmission, thus carrying a transient fitness advantage. By affecting the number of descendants, persistence times and growth rates of associated clades, these mutations generate localised imbalance in phylogenies. Quantifying these features in closely-related clades with and without recurring mutations can elucidate the tradeoffs between within-host replication and between-host transmission. Methods We implemented a novel phylogenetic clustering algorithm (mlscluster, https://github.com/mrc-ide/mlscluster) to systematically explore time-scaled phylogenies for mutations under transient/multilevel selection. We applied this method to a SARS-CoV-2 time-calibrated phylogeny with >1.2 million sequences from England, and characterised these recurrent mutations that may influence transmission fitness across PANGO-lineages and genomic regions using Poisson regressions and summary statistics. Results We found no major differences across two epidemic stages (before and after Omicron), PANGO-lineages, and genomic regions. However, spike, nucleocapsid, and ORF3a were proportionally more enriched for transmission fitness polymorphisms (TFP)-homoplasies than other proteins. We provide a catalog of SARS-CoV-2 sites under multilevel selection, which can guide experimental investigations within and beyond the spike protein. Conclusions This study provides empirical evidence for the existence of important tradeoffs between within-host replication and between-host transmission shaping the fitness landscape of SARS-CoV-2. This method may be used as a fast and scalable means to shortlist large sequence databases for sites under putative multilevel selection which may warrant subsequent confirmatory analyses and experimental confirmation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Phylogenomic early warning signals for SARS-CoV-2 epidemic waves

Cited by 3 publications

References 31 publications

Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods

Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods

A phylogenetics and variant calling pipeline to support SARS-CoV-2 genomic epidemiology in the UK

Phylogenetic signatures reveal multilevel selection and fitness costs in SARS-CoV-2

Contact Info

Product

Resources

About