2020
DOI: 10.1101/gr.266221.120
|View full text |Cite
|
Sign up to set email alerts
|

Mapping genome variation of SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders

Abstract: The human pathogen severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the major pandemic of the twenty-first century. We analyzed more than 4700 SARS-CoV-2 genomes and associated metadata retrieved from public repositories. SARS-CoV-2 sequences have a high sequence identity (>99.9%), which drops to >96% when compared to bat coronavirus genome. We built a mutation-annotated reference SARS-CoV-2 phylogeny with two main macro-haplogroups, A and B, both of Asian origin, and more … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

4
104
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 101 publications
(108 citation statements)
references
References 58 publications
4
104
0
Order By: Relevance
“…2a). This phylogeny is a rooted tree of SARS-CoV-2, which has been challenging to infer reliably [1][2][3][4][5][6][7][8] . It shows that all the early novel coronavirus lineages were established by three μ mutations, all of which were first seen in genomes sampled in China.…”
Section: Molecular Phylogeny Of Sars-cov-2 Genomesmentioning
confidence: 99%
See 1 more Smart Citation
“…2a). This phylogeny is a rooted tree of SARS-CoV-2, which has been challenging to infer reliably [1][2][3][4][5][6][7][8] . It shows that all the early novel coronavirus lineages were established by three μ mutations, all of which were first seen in genomes sampled in China.…”
Section: Molecular Phylogeny Of Sars-cov-2 Genomesmentioning
confidence: 99%
“…The early evolutionary history and order of mutational events that arose during the pandemic remain unresolved, even months after the initial detection of SARS-CoV-2 as the causal agent of COVID-19 and the acquisition of tens of thousands of genomes [1][2][3][4][5][6][7][8] . Widely-recognized impediments include a limited number of phylogenetically informative variants in genomes, the ubiquity of sequencing errors, and the lack of a closely-related outgroup sequence, all of which have complicated the inference and rooting of the SARS-CoV-2 phylogeny [1][2][3][4][5][6][7][8] . Consequently, the traditional approach to the analysis of viral spread and evolution in which a reliable genome phylogeny is first inferred, and then observed differences among sequences are mapped site-by-site, has not been able to stage the earliest mutational events in the evolution of the novel coronavirus 9,10 .…”
Section: Introductionmentioning
confidence: 99%
“…However, the events leading to the early spread of the viruses are still unclear, in part because there is substantial uncertainty about the rooting of the SARS-CoV-2 phylogeny. The importance in identifying the origin of the virus has prompted other analyses on the uncertainty of rooting the phylogeny ( Gomez-Carballa et al 2020 ; Morel et al 2020 ). Previous analyses have reached different conclusions about the rooting of the phylogeny.…”
mentioning
confidence: 99%
“…In particular, the Global Initiative on Sharing all Individual Data (GISAID; https://www.gisaid.org/) offers full open access to SARS-CoV-2 genomic data provided by hundreds of laboratories worldwide. The scientific community can analyze the whole-genome sequences available in these resources to make inferences about SARS-CoV-2 genetic variation and its phylogenetic roots, natural selection, and phylodynamics ( Boni et al, 2020 ; Forster et al, 2020 ; Gómez-Carballa et al, 2020a , 2020b ; Gudbjartsson et al, 2020 ; Rambaut et al, 2020 ; Rockett et al, 2020 ; Van Dorp et al, 2020 ; Yu et al, 2020 ). Furthermore, the fact that the coronavirus genome is only ~30 kb allows for relatively easy computational treatment.…”
mentioning
confidence: 99%
“…We propose that this redundancy could have been eliminated by simply inspecting the SARS-CoV-2 phylogeny. For instance, the phylogenetic tree skeleton in Figure 1A (inspired by Figure 3 in Gómez-Carballa et al (2020a) ), which includes the initial 20 ISMs signature, shows that: variants C8782T–T28144C together define clade B (11 nt compressed ISMs C CT GCCAAGGG in Zhao et al (2020) ); the sequence motif C241T–C3037T–A23403G characterizes clade A2 (CCCG CCA GGGG, immediate ancestral node of the most successful SARS-CoV-2 variant outside Asia, which most likely originated in Italy ( Gómez-Carballa et al, 2020a )); and G28881A–G28882A–G28883C defines haplogroup A2a4 (CCCGCCA GGG A, one of the most important sub-branches of A2 (CCCGCCAGGGG); here we favored the single multi-nucleotide polymorphism (MNP) event GGG28881AAC for nomenclature, as justified in Gómez-Carballa et al (2020a) ). In addition, their entropy-based algorithm sub-optimally prioritized positions that are diagnostic of nodes located along the same evolutionary pathway, but which add very little to the overall discrimination power of the ISMs set: A1 (CCC GC CAAGTG) makes up 4.7% of the total database, while its sub-lineage A1a (CCC TT CAAGTG) represents 4.3% and A1a3 represents 1.8% ( Figure 1A ).…”
mentioning
confidence: 99%