SARS-CoV-2 is currently causing major havoc worldwide with its efficient transmission and propagation. To track the emergence as well as the persistence of mutations during the early stage of the pandemic, a comparative analysis of SARS-CoV-2 whole proteome sequences has been performed by considering manually curated 31,389 whole genome sequences from 84 countries. Among the 7 highly recurring (percentage frequency≥10%) mutations (Nsp2:T85I, Nsp6:L37F, Nsp12:P323L, Spike:D614G, ORF3a:Q57H, N protein:R203K and N protein:G204R), N protein:R203K and N protein: G204R are co-occurring (dependent) mutations. Nsp12:P323L and Spike:D614G often appear simultaneously. The highly recurring Spike:D614G, Nsp12:P323L and Nsp6:L37F as well as moderately recurring (percentage frequency between ≥1 and <10%) ORF3a:G251V and ORF8:L84S mutations have led to4 major clades in addition to a clade that lacks high recurring mutations. Further, the occurrence of ORF3a:Q57H&Nsp2:T85I, ORF3a:Q57H and N protein:R203K&G204R along with Nsp12:P323L&Spike:D614G has led to 3 additional sub-clades. Similarly, occurrence of Nsp6:L37F and ORF3a:G251V together has led to the emergence of a sub-clade. Nonetheless, ORF8:L84S does not occur along with ORF3a:G251V or Nsp6:L37F. Intriguingly, ORF3a:G251V and ORF8:L84S are found to occur independent of Nsp12:P323L and Spike:D614G mutations. These clades have evolved during the early stage of the pandemic and have disseminated across several countries. Further, Nsp10 is found to be highly resistant to mutations, thus, it can be exploited for drug/vaccine development and the corresponding gene sequence can be used for the diagnosis. Concisely, the study reports the SARS-CoV-2 antigens diversity across the globe during the early stage of the pandemic and facilitates the understanding of viral evolution.
Nucleic acids exhibit a repertoire of conformational preference depending on the sequence and environment. Circular dichroism (CD) is an important and valuable tool for monitoring such secondary structural conformations of nucleic acids. Nonetheless, the CD spectral diversity associated with these structures poses a challenge in obtaining the quantitative information about the secondary structural content of a given CD spectrum. To this end, the competence of extreme gradient boosting decision-tree algorithm has been exploited here to predict the diverse secondary structures of nucleic acids. A curated library of 610 CD spectra corresponding to 16 different secondary structures of nucleic acids has been developed and used as a training dataset. For a test dataset of 242 CD spectra, the algorithm exhibited the prediction accuracy of 99%. For the sake of accessibility, the entire process is automated and implemented as a webserver, called CD-NuSS (CD to nucleic acids secondary structure) and is freely accessible at https://www.iith.ac.in/cdnuss/.The XGBoost algorithm presented here may also be extended to identify the hybrid nucleic acid topologies in future..
To accelerate the drug and vaccine development against the severe acute respiratory syndrome virus 2 (SARS-CoV-2), a comparative analysis of SARS-CoV-2 proteome has been performed in two phases by considering manually curated 31389 whole genome sequences from 84 countries. Among the 9 mutations that occur at a high significance (T85I-NPS2, L37F-NSP6, P323L-NSP12, D614G-spike, Q57H-ORF3a, G251V-ORF3a, L84S-ORF8, R203K-nucleocapsid and G204R-nucleocapsid), R203K-nucleocapsid and G204R-nucleocapsid are co-occurring mutations and P323L-NSP12 and D614G-spike often appear simultaneously. Other notable variations that appear with a moderate to low significance are, M85-NSP1 deletion, D268-NSP2 deletion, 112 amino acids deletion in ORF8, a phenylalanine insertion amidst F34-F36 (NSP6) and several co-existing substitution/deletion (I559V & P585S in NSP2, P504L & Y541C in NSP13, G82 & H83 deletions in NSP1 and K141, S142 & F143 deletions in NSP2) mutations. P323L-NSP12, D614G-spike, L37F-NSP6, L84S-ORF8 and the sequences deficient of the high significant mutations has led to 4 major SARS-CoV-2 clades. The top 5 countries bearing all the high significant and majority of the moderate significant mutations are: USA, England, Wales, Australia and Scotland. Further, the majority of the significant mutations has evolved in the first phase and has already transmitted around the globe indicating the positive selection pressure. Among the 26 SARS-CoV-2 proteins, nucleocapsid protein, ORF3a, ORF8, RNA dependent RNA polymerase and spike exhibit a higher heterogeneity compared with the rest of the proteins. However, NSP9, NSP10, NSP8, the envelope protein and NSP4 are highly resistant to mutations and can be exploited for drug/vaccine development.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) kills thousands of people worldwide every day, thus necessitating rapid development of countermeasures. Immunoinformatics analyses carried out here in search of immunodominant regions in recently identified SARS-CoV-2 unannotated open reading frames (uORFs) have identified eight linear B-cell, one conformational B-cell, 10 CD4+ T-cell, and 12 CD8+ T-cell promising epitopes. Among them, ORF9b B-cell and T-cell epitopes are the most promising followed by M.ext and ORF3c epitopes. ORF9b40-48 (CD8+ T-cell epitope) is found to be highly immunogenic and antigenic with the highest allele coverage. Furthermore, it has overlap with four potent CD4+ T-cell epitopes. Structure-based B-cell epitope prediction has identified ORF9b61-68 to be immunodominant, which partially overlaps with one of the linear B-cell epitopes (ORF9b65-69). ORF3c CD4+ T-cell epitopes (ORF3c2-16, ORF3c3-17, and ORF3c4-18) and linear B-cell epitope (ORF3c14-22) have also been identified as the candidate epitopes. Similarly, M.ext and 7a.iORF1 (overlap with M and ORF7a) proteins have promising immunogenic regions. By considering the level of antigen expression, four ORF9b and five M.ext epitopes are finally shortlisted as potent epitopes. Mutation analysis has further revealed that the shortlisted potent uORF epitopes are resistant to recurrent mutations. Additionally, four N-protein (expressed by canonical ORF) epitopes are found to be potent. Thus, SARS-CoV-2 uORF B-cell and T-cell epitopes identified here along with canonical ORF epitopes may aid in the design of a promising epitope-based polyvalent vaccine (when connected through appropriate linkers) against SARS-CoV-2. Such a vaccine can act as a bulwark against SARS-CoV-2, especially in the scenario of emergence of variants with recurring mutations in the spike protein.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.