2020
DOI: 10.1101/2020.06.02.130955
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes

Abstract: Despite its overwhelming clinical importance for understanding and mitigating the COVID-19 pandemic, the protein-coding gene content of the SARS-CoV-2 genome remains unresolved, with the function and even protein-coding status of many hypothetical proteins unknown and often conflicting among different annotations, thus hindering efforts for systematic dissection of its biology and the impact of recent mutations. Comparative genomics is a powerful approach for distinguishing protein-coding versus non-coding fun… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
62
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(64 citation statements)
references
References 60 publications
2
62
0
Order By: Relevance
“…Analysis of the conservation of these outof-frame iORFs in SARS-CoV and in related viruses (Sarbecoviruses) revealed 3a.iORF1 is highly conserved in Sarbecoviruses (Table S6). This ORF was also identified by three independent comparative genomic studies that demonstrate this ORF has a significant purifying selection signature, implying it is a functional polypeptide [28][29][30] . In combination with our expression measurements, these findings indicate this internal ORF is a novel and likely functional transmembrane protein, conserved throughout sarbecoviruses and as was suggested by Jungreis et al 30 should be named ORF3c.…”
Section: Mainmentioning
confidence: 83%
See 1 more Smart Citation
“…Analysis of the conservation of these outof-frame iORFs in SARS-CoV and in related viruses (Sarbecoviruses) revealed 3a.iORF1 is highly conserved in Sarbecoviruses (Table S6). This ORF was also identified by three independent comparative genomic studies that demonstrate this ORF has a significant purifying selection signature, implying it is a functional polypeptide [28][29][30] . In combination with our expression measurements, these findings indicate this internal ORF is a novel and likely functional transmembrane protein, conserved throughout sarbecoviruses and as was suggested by Jungreis et al 30 should be named ORF3c.…”
Section: Mainmentioning
confidence: 83%
“…The internal S-ORF is situated just downstream of the ORF-S AUG, suggesting ribosomes might initiate translation via leaky scanning. This region in the S-protein shows extremely-rapid evolution 30 but in the SARS-CoV-2 isolates that have been sequenced its coding capacity is not impaired. Future work will have to delineate if this ORF, which is highly expressed ( Figure 4B and Figure S20), represents a functional protein.…”
Section: Mainmentioning
confidence: 99%
“…The positive and negative bases were matched by their nucleotide composition such that the proportions of each nucleotide in positive and negative bases were the same. As a comparison, we also used existing sequence constraint annotations learned from the sequence alignment identical to or containing the same strains as those provided to ConsHMM as input, which include PhastCons scores 4 , PhyloP scores 5 , and a five-way annotation based on mutation type and codon constraint in Sarbecoviruses 16 (Methods). For PhastCons and PhyloP scores, we additionally generated discrete versions of the annotation by binning the scores and evaluated these bins in an analogous way to state annotations (Methods).…”
Section: Using Conservation States To Predict Sars-cov-2 Mutationsmentioning
confidence: 99%
“…Specifically, for annotation of genes, genes were ranked based on their enrichment for positive training bases, which was later used to rank test bases. The same procedure was applied to an annotation of possible intergenic, synonymous, missense, and nonsense mutations and separately to a five-way annotation based on mutation type and codon constraint in Sarbecoviruses 16 . In each of these categorical predictors, if a test base was assigned to multiple categories, the category with the stronger enrichment for positive training bases was assigned to the test base.…”
Section: Prediction Of Genomic Bases Without Nonsingleton Mutationsmentioning
confidence: 99%
“…For instance, high SHAPE reactivities were found at the loop region (G71-A75), indicating strong possibility of a single-stranded nature, while low SHAPE reactivities were found on nucleotides predicted to be base-paired, such as nucleotides from 54 to 58, and nucleotides from 90 to 94. We further performed SHAPE probing on an extended version of the 3' UTR RNA (extended 3ʹ UTR) that additionally includes ORF10 and the region immediately upstream from it (ORF10 may not be protein coding; Taiaroa et al, 2020;Jungreis et al, 2020) (Fig. 1A and S1 and Table 1).…”
Section: Model For the S2m Element In The Context Of The Sars-cov-2 3mentioning
confidence: 99%