High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-ofmagnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained w1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples.taxonomy bin | operational taxonomic unit T he increasing abundance of 16S rRNA gene sequences stimulated by reduced sequencing costs and greatly expanded parallel capacities is providing a more encompassing view of microbial communities (1). Although the short read lengths provided by the current technologies make it more challenging to assign sequences to bacterial taxonomy, the depth and replication provided are powerful advantages (2-4).Information on bacterial community structure can be compiled in a matrix where different communities are represented as rows and "species" as columns, i.e., a community-by-species matrix. When describing bacterial community relationships based on 16S rRNA gene sequences, each sequence is allocated to a species, usually termed an operational taxonomic unit (OTU), by alignment-based clustering at a specified nucleotide distance, often at a 97% identity. This community-by-OTU matrix, which is based exclusively on nucleotide distances among 16S rRNA gene sequences, has bacterial communities as rows with OTU as columns. This community-by-OTU matrix can be used to measure dissimilarities between bacterial communities (β-diversity) either by presence/absence or abundance data. These dissimilarities combined in a distance matrix can be used for bacterial community comparisons by ordination and clustering m...