Xiyu Peng scite author profile

Burrough

et al. 2020

Front. Microbiol.

Post-weaning diarrhea caused by enterotoxigenic E. coli (ETEC) causes significant economic losses for pig producers. This study was to test the hypotheses that an ETEC challenge disrupts intestinal microbial homeostasis and the inclusion of dietary soluble (10% sugar beet pulp) or insoluble fiber (15% corn distillers dried grains with solubles) with or without exogenous carbohydrases will protect or restore the gut microbial homeostasis in weaned pigs. Sixty crossbred piglets (6.9 ± 0.1 kg) were blocked by body weight and randomly assigned to one of six treatments (n = 10), including a nonchallenged control (NC), ETEC F18-challenged positive control (PC), ETEC-challenged soluble fiber without (SF-) or with carbohydrases (SF+), and ETEC-challenged insoluble fiber without (IF-) or with carbohydrases (IF+). Pigs were housed individually and orally received either ETEC inoculum or PBS-sham inoculum on day 7 post-weaning. Intestinal contents were collected on day 14 or 15. The V4 region of the bacterial 16S rRNA was amplified and sequenced. High-quality reads (total 6,671,739) were selected and clustered into 3,330 OTUs. No differences were observed in α-diversity among treatments. The ileal microbiota in NC and PC had modest separation in the weighted PCoA plot; the microbial structures were slightly altered by SF+ and IF-compared with PC. The PC increased ileal Escherichia-Shigella (P < 0.01) and numerically decreased Lactobacillus compared to NC. Predicted functional pathways enriched in the ileal microbiota of PC pigs indicated enhanced activity of Gram-negative bacteria, in agreement with increased Escherichia-Shigella. The SF+ tended to decrease (P < 0.10) ileal Escherichia-Shigella compared to PC. Greater abundance of ileal Streptococcus, Turicibacter, and Roseburia and colonic Prevotella were observed in SF-and SF+ than PC (P < 0.05). Pigs fed IF + had greater Lactobacillus and Roseburia than PC pigs (P < 0.05). The ETEC challenge reduced total volatile fatty acid (VFA) compared with NC (P < 0.05). The SF+ tended to increase (P < 0.10) and SF-significantly

AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data

Dorman

2020

Motivation Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false positive rates. Recently developed “denoising” methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. Results We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI takes into account quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. Availability Source code is available at https://github.com/DormanLab/AmpliCI. Supplementary information Supplementary material are available at Bioinformatics online.

AmpliCI: A High-resolution Model-Based Approach for Denoising Illumina Amplicon Data

Dorman

2020

Preprint

Motivation: Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. One main challenge is to distinguish true biological variants from errors caused by PCR and sequencing. In the traditional analysis pipeline, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false positive rates. Recently developed "denoising" methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low frequency sequences, especially those near abundant variants, because they ignore the sequencing quality information.Results: We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI takes into account quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. We show that AmpliCI is superior to three popular denoising methods, with acceptable computation time and memory usage.Availability: Source code available at https://github.com/DormanLab/AmpliCIThe utility of biomarkers is degraded by sequencing errors, PCR amplification errors, and intrastrain/species-specific variability [1]. To account for these factors, a typical first step of microbiome analysis is to resolve the data into Operational Taxonomic Units (OTUs), or clusters of sequences with 97% or greater similarity. There are many methods for identifying OTUs [2], roughly classifiable into closed-reference methods, which use a reference database of known organisms, or de novo methods.However, when applied to mock communities, it is widely found that both types of methods cannot accurately identify true OTUs in a sample [3,4,5,6,7,8].OTUs are problematic entities, lacking both biological and physical interpretability. They only roughly correspond to biological species, genera or higher taxonomic entities, and they do not correspond to true, error-free sequences in the sample. Thus, OTU-based methods are prone to both false positives and negatives, reporting error sequences as OTUs and missing subtle and real biological sequence variation, such as SNPs. The 97% threshold, motivated by empirical studies [9, 10], fails to reliably achieve genus or species level resolution [11,12]. There are distinct species with 97% or more similar 16S rRNA [13,14], and strains whose 16S rRNA locally differ by more than 3% [15].Amplicon sequencing data from current Illumina platforms support de novo single-nucleotide resolution [16]. Modern methods attempt to identify all the unique sequences in the sample [17,18,19,20,16,21,22,23]. Such denoising methods make no biological judgment on taxonomic entities, but simply remove or correct sequences produc...

Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers

Dorman

2023

Motivation Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during Polymerase Chain Reaction (PCR) and sequencing. One solution attaches Unique Molecular Identifiers (UMIs) to sample sequences before amplification eliminating amplification bias by clustering reads on UMI and counting clusters to quantify abundance. While modern methods improve over naïve clustering by UMI identity, most do not account for UMI reuse, or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences. Results We introduce Deduplication and Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological amplicon sequences and accurately estimate their deduplicated abundance. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods. Availability Source code is available at https://github.com/DormanLab/AmpliCI. Supplementary information Supplementary material are available at Bioinformatics online.

1297 Uncovering the hidden structure of T cell compositions in peripheral blood after immune checkpoint inhibitor

Lee

Adamow

et al. 2022