Background
Variation of the betacoronavirus SARS-CoV-2 has been the bane of COVID-19 control. Documented variation includes point mutations, deletions, insertions, and recombination among closely or distantly related coronaviruses. Here, we describe yet another aspect of genome variation by beta- and alphacoronaviruses that was first documented in an infectious isolate of the betacoronavirus SARS-CoV-2, obtained from 3 patients in Hong Kong that had a 5′-untranslated region segment at the end of the ORF6 gene that in its new location translated into an ORF6 protein with a predicted modified carboxyl terminus. While comparing the amino acid sequences of translated ORF8 genes in the GenBank database, we found a subsegment of the same 5′-UTR-derived amino acid sequence modifying the distal end of ORF8 of an isolate from the United States and decided to carry out a systematic search.
Methods
Using the nucleotide and in the case of SARS-CoV-2 also the translated amino acid sequence in three reading frames of the genomic termini of coronaviruses as query sequences, we searched for 5′-UTR sequences in regions other than the 5′-UTR in SARS-CoV-2 and reference strains of alpha-, beta-, gamma-, and delta-coronaviruses.
Results
We here report numerous genomic insertions of 5′-untranslated region sequences into coding regions of SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, but not delta- or gammacoronaviruses. To our knowledge this is the first systematic description of such insertions. In many cases, these insertions would change viral protein sequences and further foster genomic flexibility and viral adaptability through insertion of transcription regulatory sequences in novel positions within the genome. Among human Embecorivus betacoronaviruses, for instance, from 65% to all of the surveyed sequences in publicly available databases contain inserted 5′-UTR sequences.
Conclusion
The intragenomic rearrangements involving 5′-untranslated region sequences described here, which in several cases affect highly conserved genes with a low propensity for recombination, may underlie the generation of variants homotypic with those of concern or interest and with potentially differing pathogenic profiles. Intragenomic rearrangements thus add to our appreciation of how variants of SARS-CoV-2 and other beta- and alphacoronaviruses may arise.