Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline’s performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.
Almost half of patients show no primary or secondary response to monoclonal anti-tumor necrosis factor α (anti-TNF) antibody treatment for inflammatory bowel disease (IBD). Thus, the exact mechanisms of a non-durable response (NDR) remain inadequately defined. We used our genome-wide genotype data to impute expression values as features in training machine learning models to predict a NDR. Blood samples from various IBD cohorts were used for genotyping with the Korea Biobank Array. A total of 234 patients with Crohn’s disease (CD) who received their first anti-TNF therapy were enrolled. The expression profiles of 6294 genes in whole-blood tissue imputed from the genotype data were combined with clinical parameters to train a logistic model to predict the NDR. The top two and three most significant features were genetic features (DPY19L3, GSTT1, and NUCB1), not clinical features. The logistic regression of the NDR vs. DR status in our cohort by the imputed expression levels showed that the β coefficients were positive for DPY19L3 and GSTT1, and negative for NUCB1, concordant with the known eQTL information. Machine learning models using imputed gene expression features effectively predicted NDR to anti-TNF agents in patients with CD.
Although gut microbiome dysbiosis has been associated with inflammatory bowel disease (IBD), the relationship between the oral microbiota and IBD remains poorly understood. This study aimed to identify unique microbiome patterns in saliva from IBD patients and explore potential oral microbial markers for differentiating Crohn’s disease (CD) and ulcerative colitis (UC). A prospective cohort study recruited IBD patients (UC: n = 175, CD: n = 127) and healthy controls (HC: n = 100) to analyze their oral microbiota using 16S rRNA gene sequencing. Machine learning models (sparse partial least squares discriminant analysis (sPLS-DA)) were trained with the sequencing data to classify CD and UC. Taxonomic classification resulted in 4041 phylotypes using Kraken2 and the SILVA reference database. After quality filtering, 398 samples (UC: n = 175, CD: n = 124, HC: n = 99) and 2711 phylotypes were included. Alpha diversity analysis revealed significantly reduced richness in the microbiome of IBD patients compared to healthy controls. The sPLS-DA model achieved high accuracy (mean accuracy: 0.908, and AUC: 0.966) in distinguishing IBD vs. HC, as well as good accuracy (0.846) and AUC (0.923) in differentiating CD vs. UC. These findings highlight distinct oral microbiome patterns in IBD and provide insights into potential diagnostic markers.
To train an automatic brain tumor segmentation model, a large amount of data is required. In this paper, we proposed a strategy to overcome the limited amount of clinically collected magnetic resonance image (MRI) data regarding meningiomas by pre-training a model using a larger public dataset of MRIs of gliomas and augmenting our meningioma training set with normal brain MRIs. Pre-operative MRIs of 91 meningioma patients (171 MRIs) and 10 non-meningioma patients (normal brains) were collected between 2016 and 2019. Three-dimensional (3D) U-Net was used as the base architecture. The model was pre-trained with BraTS 2019 data, then fine-tuned with our datasets consisting of 154 meningioma MRIs and 10 normal brain MRIs. To increase the utility of the normal brain MRIs, a novel balanced Dice loss (BDL) function was used instead of the conventional soft Dice loss function. The model performance was evaluated using the Dice scores across the remaining 17 meningioma MRIs. The segmentation performance of the model was sequentially improved via the pre-training and inclusion of normal brain images. The Dice scores improved from 0.72 to 0.76 when the model was pre-trained. The inclusion of normal brain MRIs to fine-tune the model improved the Dice score; it increased to 0.79. When employing BDL as the loss function, the Dice score reached 0.84. The proposed learning strategy for U-net showed potential for use in segmenting meningioma lesions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.