Infectious disease monitoring on Oxford Nanopore Technologies (ONT) platforms offers rapid turnaround times and low cost, exemplified by well over a half of million ONT SARS-COV-2 datasets. Tracking low frequency intra-host variants has provided important insights with respect to elucidating within host viral population dynamics and transmission. However, given the higher error rate of ONT, accurate identification of intra-host variants with low allele frequencies remains an open challenge with no viable solutions available. In response to this need, we present Variabel, a novel approach and first method designed for rescuing low frequency intra-host variants from ONT data alone. We evaluated Variabel on both within patient and across patient paired Illumina and ONT datasets; our results show that Variabel can accurately identify low frequency variants below 0.5 allele frequency, outperforming existing state-of-the-art ONT variant callers for this task. Variabel is open-source and available for download at: www.gitlab.com/treangenlab/variabel.
The COVID-19 pandemic forever underscored the need for biosurveillance platforms capable of rapid detection of previously unseen pathogens. Oxford Nanopore Technology (ONT) couples long-read sequencing with in-field capability, opening the door to real-time, in-field biosurveillance. Though a promising technology, streaming assignment of accurate functional and taxonomic labels with nanopore reads remains challenging given: (i) individual reads can span multiple genes, (ii) individual reads may contain truncated genes, and pseudogenes, (iii) the error rate of the ONT platform that may introduce frameshifts and missense errors, and (iv) the computational costs of read-by-read analysis may exceed that of in-field computational equipment. Altogether, these challenges highlight a need for novel computational approaches. To this end, we describe SeqSeqscreen-Nano, a novel and portable computational platform for the characterization of novel pathogens. Based on results from simulated and synthetic microbial communities, SeqScreen-Nano can identify Open Reading Frames (ORFs) across the length of raw ONT reads and then use the predicted ORFs for accurate functional characterization and taxonomic classification. SeqScreen-Nano can run efficiently in a memory-constrained environment (less than 32GB of RAM), allowing it to be utilized in resource-limited settings. SeqScreen-Nano can also process reads directly from the ONT MinION sequencing device, enabling rapid, in-field characterization of previously unseen pathogens. SeqScreen-Nano (v4.0) is available on GitLab at:https://gitlab.com/treangenlab/seqscreen
Computational analysis of host-associated microbiomes has opened the door to numerous discoveries relevant to human health and disease. However, contaminant sequences in metagenomic samples can potentially impact the interpretation of findings reported in microbiome studies, especially in low biomass environments. Our hypothesis is that contamination from DNA extraction kits or sampling lab environments will leave taxonomic bread crumbs across multiple distinct sample types, allowing for the detection of microbial contaminants when negative controls are unavailable. To test this hypothesis we implemented Squeegee, a de novo contamination detection tool. We tested Squeegee on simulated and real low biomass metagenomic datasets. On the low biomass samples, we compared Squeegee predictions to experimental negative control data and show that Squeegee accurately recovers known contaminants. We also analyzed 749 metagenomic datasets from the Human Microbiome Project and identified likely previously unreported kit contamination. Collectively, our results highlight that Squeegee can identify microbial contaminants with high precision. Squeegee is open-source and available at: https://gitlab.com/treangenlab/squeegee
Tiled amplicon sequencing has served as an essential tool for tracking the spread and evolution of SARS-CoV-2 in real-time directly from environmental and clinical samples. Over 14 million SARS-CoV-2 genomes are now available on GISAID, most sequenced and assembled via tiled amplicon sequencing. While computational tools for tiled amplicon design exist, they require downstream manual optimization both computationally and experimentally, which is slow, laborious, and costly. Here, we present Olivar, the first open-source computational tool capable of fully automating the design of tiled amplicons by integrating SNPs, non-specific amplification, etc. into a "risk score" for each single nucleotide of the target genome. Oli- var evaluates thousands sets of possible tiled amplicons and minimizes primer dimer in parallel. In a direct in-silico com- parison with PrimalScheme, we show that Olivar has fewer SNPs overlapping with primers and predicted PCR byproducts. We also compared Olivar head-to-head with ARTIC v4.1, the most widely used tiled amplicons for SARS-CoV-2 sequencing. We next tested Olivar on real wastewater samples and found that our automated approach had up to 3-fold higher map- ping rates compared to ARTIC v4.1 while retaining similar coverage. To the best of our knowledge, Olivar represents the first open-source, fully automated design tool that simultaneously evaluates and optimizes risks of known primer design issues for robust tiled amplicon sequencing. Olivar is available as a web application at https://olivar.rice.edu/. Olivar can also be installed locally as a command line tool with Bioconda. Source code, installation guide and usage are available at https: //gitlab.com/treangenlab/olivar.
MotivationInfluenza is a rapidly mutating RNA virus responsible for annual epidemics causing substantial morbidity, mortality, and economic loss. Characterizing influenza virus mutational diversity and evolutionary processes within and between human hosts can provide tools to help track and understand transmission events. In this study we investigated possible differences between the intrahost genomic content of influenza virus in upper respiratory swabs and exhaled aerosols thought to be enriched for virus from the lower respiratory tract.ResultsWe examined the sequences of specimens collected from influenza A virus (IAV) infected college community members from December 2012 through May 2013. We analyzed four types of IAV samples (fine ≤5 µm aerosols (N=38), coarse >5µm aerosols (N=27), nasopharyngeal (N=53), and oropharyngeal swabs (N=47)) collected from 42 study participants with 60 sampling instances. Eighteen (42.9%) participants had data from four sample types (nasopharyngeal swab, oropharyngeal swab, coarse aerosol, fine aerosol) included in the analysis, 10 (23.8%) had data from 3 sample types, 10 (23.8%) had data from 2 sample types, and 4 (9.5%) had data from one type of sample included in the analysis. We found that 481 (53.3%) consensus single nucleotide polymorphisms are shared by all sample types and 600 (66.5%) are shared by at least three different sample types. We observed that within a single patient consensus and non-consensus single nucleotide variants are shared across all sample types. Finally, we inferred a phylogenetic tree using consensus sequences and found that samples derived from a single patient are monophyletic.ConclusionsSingle nucleotide polymorphisms did not differentiate between samples with varying origin along the respiratory tree. We found that signatures of variation in non-consensus intrahost single nucleotide variants are host and sample, but not site specific. We conclude that the genomic information available does not allow us to discern a transmission route. Future investigation into whether any site-specific mutational signatures emerge over a longer period of infection, for example in immunocompromised hosts, can be interesting from the virus evolution perspective.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.