Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a global pandemic of coronavirus disease in 2019 (COVID-19). Genome surveillance is a key method to track the spread of SARS-CoV-2 variants. Genetic diversity and evolution of SARS-CoV-2 were analyzed based on 260,673 whole-genome sequences, which were sampled from 62 countries between 24 December 2019 and 12 January 2021. We found that amino acid (AA) substitutions were observed in all SARS-CoV-2 proteins, and the top six proteins with the highest substitution rates were ORF10, nucleocapsid, ORF3a, spike glycoprotein, RNA-dependent RNA polymerase, and ORF8. Among 25,629 amino acid substitutions at 8484 polymorphic sites across the coding region of the SARS-CoV-2 genome, the D614G (93.88%) variant in spike and the P323L (93.74%) variant in RNA-dependent RNA polymerase were the dominant variants on six continents. As of January 2021, the genomic sequences of SARS-CoV-2 could be divided into at least 12 different clades. Distributions of SARS-CoV-2 clades were featured with temporal and geographical dynamics on six continents. Overall, this large-scale analysis provides a detailed mapping of SARS-CoV-2 variants in different geographic areas at different time points, highlighting the importance of evaluating highly prevalent variants in the development of SARS-CoV-2 antiviral drugs and vaccines.
On June 8, 2018, an NS3/4A protease inhibitor called danoprevir was approved in China to treat the infections of HCV genotype (GT) 1b – the most common HCV genotype worldwide. Based on phase 2 and 3 clinical trials, the 12-week regimen of ritonavir-boosted danoprevir (danoprevir/r) plus peginterferon alpha-2a and ribavirin offered 97.1% (200/206) of sustained virologic response at post-treatment week 12 (SVR12) in treatment-naïve non-cirrhotic patients infected with HCV genotype 1b. Adverse events such as anemia, fatigue, fever, and headache were associated with the inclusion of peginterferon alpha-2a and ribavirin in the danoprevir-based regimen. Moreover, drug resistance to danoprevir could be traced to amino acid substitutions (Q80K/R, R155K, D168A/E/H/N/T/V) near the drug-binding pocket of HCV NS3 protease. Despite its approval, the clinical use of danoprevir is currently limited to its combination with peginterferon alpha-2a and ribavirin, thereby driving its development towards interferon-free, ribavirin-free regimens with improved tolerability and adherence. In the foreseeable future, pan-genotypic direct-acting antivirals with better clinical efficacy and less adverse events will be available to treat HCV infections worldwide.
Despite the active development of SARS-CoV-2 surveillance methods (e.g., Nextstrain, GISAID, Pangolin), the global emergence of various SARS-CoV-2 viral lineages that potentially cause antiviral and vaccine failure has driven the need for accurate and efficient SARS-CoV-2 genome sequence classifiers. This study presents an optimized method that accurately identifies the viral lineages of SARS-CoV-2 genome sequences using existing schemes. For Nextstrain and GISAID clades, a template matching-based method is proposed to quantify the differences between viral clades and to play an important role in classification evaluation. Furthermore, to improve the typing accuracy of SARS-CoV-2 genome sequences, an ensemble model that integrates a combination of machine learning-based methods (such as Random Forest and Catboost) with optimized weights is proposed for Nextstrain, Pangolin, and GISAID clades. Cross-validation is applied to optimize the parameters of the machine learning-based method and the weight settings of the ensemble model. To improve the efficiency of the model, in addition to the one-hot encoding method, we have proposed a nucleotide site mutation-based data structure that requires less computational resources and performs better in SARS-CoV-2 genome sequence typing. Based on an accumulated database of >1 million SARS-CoV-2 genome sequences, performance evaluations show that the proposed system has a typing accuracy of 99.879%, 97.732%, and 96.291% for Nextstrain, Pangolin, and GISAID clades, respectively. A single prediction only takes an average of <20 ms on a portable laptop. Overall, this study provides an efficient and accurate SARS-CoV-2 genome sequence typing system that benefits current and future surveillance of SARS-CoV-2 variants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.