2021
DOI: 10.1038/s41467-021-22905-7
|View full text |Cite
|
Sign up to set email alerts
|

SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes

Abstract: Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
107
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 151 publications
(108 citation statements)
references
References 102 publications
(169 reference statements)
1
107
0
Order By: Relevance
“…ORF3c (originally known as iORF1) is a 41-codon long protein recently described as an accessory protein ( Figure 1 ) ( 26 , 27 ). ORF3c is encoded by a frame overlapping ORF3a (close to the 5’ end) and is conserved across Sarbecoviruses .…”
Section: Accessory Proteins Of Sars-cov-2mentioning
confidence: 99%
“…ORF3c (originally known as iORF1) is a 41-codon long protein recently described as an accessory protein ( Figure 1 ) ( 26 , 27 ). ORF3c is encoded by a frame overlapping ORF3a (close to the 5’ end) and is conserved across Sarbecoviruses .…”
Section: Accessory Proteins Of Sars-cov-2mentioning
confidence: 99%
“…Analysis of the 3’ genome region of SARS-CoV-2 by the present method can be compared to that performed by Jungreis et al (2021b) using PhyloCSF, a computational tool to detect evolutionary signatures of protein-coding regions ( Lin et al, 2011 ). By analysis of 44 Sarbecovirus genomes, Jungreis et al (2021b) found strong protein-coding signatures for the overlapping ORFs ORF3c and ORF9b, but not for the overlapping ORFs ORF2b, ORF3b, ORF3d and its isoform ORF3d-2, and ORF9c. CodScr + SeqComp differed from PhyloCSF because it predicted as candidate also ORF3d, in addition to ORF3c and ORF9b ( Table 3 ), and because it detected ORF-Sh and ORF-Mh.…”
Section: Resultsmentioning
confidence: 99%
“…The percentage of SNVs in the S-gene (coding the S-glycoprotein) is nearly twice as large in spot A (and thus PAT A) compared with spot D reflecting an increase of the relative mutational load in this gene from PATs D, and EF towards PATs A-C paralleled by the decrease of the mutational load in ORF1a,b (Figure 6b and middle and Figure 6d for comparison with the respective percentages across all SNV and nucleotides of the SARS-CoV-2 genome). The percentage of SNVs of the N gene is large in PAT A, B and D indicating subtle shifts between the different genes as a result of evolutionary adaptation [39]. SNVs in N involve a B-cell epitope, suggesting immune-avoidance selection [39,40].…”
Section: Cartography Of the Mutational Landscapementioning
confidence: 99%
“…The percentage of SNVs of the N gene is large in PAT A, B and D indicating subtle shifts between the different genes as a result of evolutionary adaptation [39]. SNVs in N involve a B-cell epitope, suggesting immune-avoidance selection [39,40]. The S-genes divides into different parts, namely, S1 coding the 'spike' (pointing towards the host, see scheme in Figure 6e) and including the RBD (receptor binding domain), as well as the S2 region anchoring the protein in the virus membrane.…”
Section: Cartography Of the Mutational Landscapementioning
confidence: 99%