We introduce a novel mathematical model to analyze the effect of removing non-pharmaceutical interventions on the spread of COVID19 as a function of disease testing rate. We find that relaxing interventions has a strong impact on the size of the epidemic peak as a function of intervention removal time. We show that it is essential for predictive models to explicitly capture transmission from asymptomatic carriers and important to obtain precise information on asymptomatic transmission by testing. The asymptomatic reservoir, reported to account for as much as 85% of transmission, will contribute to resurgence of the epidemic if public health interventions are removed too soon.Use of more basic models that fail to capture asymptomatic transmission can result in large errors in predicted clinical caseload or in fitted epidemiological parameters and, therefore, may be unreliable in estimating the risk of a second wave based on the timing of terminated interventions.
SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. However, autonomous genome annotation of SARS-CoV-2 genes, proteins, and domains is not readily accomplished by existing methods and results in missing or incorrect sequences. To overcome this limitation, we developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on the use of a single reference genome and by overcoming atypical genomic traits that challenge traditional bioinformatic methods. We analyzed an initial corpus of 66,000 SARS-CoV-2 genome sequences collected from labs across the world using our method and identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction, compared to proteome references, including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools, such as Prokka (base) and VAPiD, we yielded a 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 gene, protein, and domain sequences—some conserved across time and geography and others representing emerging variants. We observed 3362 non-redundant sequences per protein on average within this corpus and described key D614G and N501Y variants spatiotemporally in the initial genome corpus. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized receptor binding domain variants. We further demonstrated the robustness and extensibility of our method on an additional 4000 variant diverse genomes containing all named variants of concern and interest as of August 2021. In this cohort, we successfully identified all keystone spike glycoprotein mutations in our predicted protein sequences with greater than 99% accuracy as well as demonstrating high accuracy of the protein and domain annotations. This work comprehensively presents the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable, high-accuracy method to analyze newly sequenced infections as they arise.
SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. In this work, we analyzed a corpus of 66,000 SARS-CoV-2 genome sequences. We developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on use of a single reference genome and by overcoming atypical genome traits. Using this method, we identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction compared to proteome references including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools such as Prokka (base) and VAPiD, we yielded an 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 molecular target sequences— some conserved across time and geography while others represent emerging variants. We observed 3,362 non-redundant sequences per protein on average within this corpus and describe key D614G and N501Y variants spatiotemporally. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized Receptor Binding Domain variants. Here, we comprehensively present the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable high-accuracy method to analyze newly sequenced infections.
Horizontal gene transfer mediated by integrative and conjugative elements (ICE) is considered an important evolutionary mechanism of bacteria. It allows organisms to quickly evolve new phenotypic properties including antimicrobial resistance (AMR) and virulence. The rate of ICE-mediated cargo gene exchange has not yet been comprehensively studied within and between bacterial taxa. In this paper we report a big data analysis of ICE and associated cargo genes across over 200,000 bacterial genomes representing 1,345 genera. Our results reveal that half of bacterial genomes contain one or more known ICE features ("ICE genomes"), and that the associated genetic cargo may play an important role in the spread of AMR genes within and between bacterial genera. We identify 43 AMR genes that appear only in ICE genomes and never in non-ICE genomes. A further set of 95 AMR genes are found >5x more often in ICE versus non-ICE genomes. In contrast, only 29 AMR genes are observed more frequently (at least 5:1) in non-ICE genomes compared to ICE genomes. Analysis of NCBI antibiotic susceptibility assay data reveals that ICE genomes are also over-represented amongst phenotypically resistant isolates, suggesting that ICE processes are critical for both genotypic and phenotypic AMR. These results, as well as the underlying big data resource, are important foundational tools for understanding bacterial evolution, particularly in relation to important bacterial phenotypes such as AMR.• ICE genes are found in roughly 50% of bacterial genomes. • The distribution of ICE features across genomes is variable, depending on the ICE feature.• Most AMR genes are over-represented within genomes that also contain ICE features, compared to genomes that do not contain ICE features. • Phenotypic resistance to antimicrobial drugs is much more common for isolates that contain ICE within their genomes, compared to isolates that do not.
Rapid tests for active SARS-CoV-2 infections rely on reverse transcription polymerase chain reaction (RT-PCR). RT-PCR uses reverse transcription of RNA into complementary DNA (cDNA) and amplification of specific DNA (primer and probe) targets using polymerase chain reaction (PCR). The technology makes rapid and specific identification of the virus possible based on sequence homology of nucleic acid sequence and is much faster than tissue culture or animal cell models. However the technique can lose sensitivity over time as the virus evolves and the target sequences diverge from the selective primer sequences. Different primer sequences have been adopted in different geographic regions. As we rely on these existing RT-PCR primers to track and manage the spread of the Coronavirus, it is imperative to understand how SARS-CoV-2 mutations, over time and geographically, diverge from existing primers used today. In this study, we analyze the performance of the SARS-CoV-2 primers in use today by measuring the number of mismatches between primer sequence and genome targets over time and spatially. We find that there is a growing number of mismatches, an increase by 2% per month, as well as a high specificity of virus based on geographic location.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.