c Shotgun metagenomic sequencing does not depend on gene-targeted primers or PCR amplification; thus, it is not affected by primer bias or chimeras. However, searching rRNA genes from large shotgun Illumina data sets is computationally expensive, and no approach exists for unsupervised community analysis of small-subunit (SSU) rRNA gene fragments retrieved from shotgun data. We present a pipeline, SSUsearch, to achieve the faster identification of short-subunit rRNA gene fragments and enabled unsupervised community analysis with shotgun data. It also includes classification and copy number correction, and the output can be used by traditional amplicon analysis platforms. Shotgun metagenome data using this pipeline yielded higher diversity estimates than amplicon data but retained the grouping of samples in ordination analyses. We applied this pipeline to soil samples with paired shotgun and amplicon data and confirmed bias against Verrucomicrobia in a commonly used V6-V8 primer set, as well as discovering likely bias against Actinobacteria and for Verrucomicrobia in a commonly used V4 primer set. This pipeline can utilize all variable regions in SSU rRNA and also can be applied to large-subunit (LSU) rRNA genes for confirmation of community structure. The pipeline can scale to handle large amounts of soil metagenomic data (5 Gb memory and 5 central processing unit hours to process 38 Gb [1 lane] of trimmed Illumina HiSeq2500 data) and is freely available at https: //github.com/dib-lab/SSUsearch under a BSD license.
Microbial phylogeny, identification, and evolution studies were revolutionized by the introduction of small-subunit (SSU) rRNA analysis 25 years ago (1), and with the advent of PCR and high-throughput sequencing, community structure studies now are commonplace (2-5). The growing sizes of SSU rRNA gene databases provide a rich ecological and phylogenetic context for SSU rRNA gene-based community structure surveys (6, 7). However, the accuracy of PCR-based amplicon approaches is reduced by primer bias and chimeras (8, 9).Unlike gene-targeted amplicon sequencing, shotgun sequencing takes samples from the entire community by sequencing randomly sheared fragments of DNA (10, 11). Hence, while amplicon sequencing can provide far deeper coverage of SSU rRNA genes with the same amount of sequencing, shotgun sequencing may provide a more accurate characterization of microbial diversity, including functional diversity (12). In particular, shotgun sequencing may provide an improved means to detect divergent sequences not recovered by standard SSU rRNA gene primers, such as those of Verrucomicrobia, as well as eukaryotic members of the community (8,(12)(13)(14). Note that both approaches remain prone to sequencing error and bias from environmental DNA extraction (9).The challenges for using shotgun DNA for rRNA analyses are in efficiently searching for these fragments in large sequence data sets and the subsequent analysis of the matching short reads. Several methods have been developed for SSU rRNA retrieval in ...