18 Background: The low cost of 16S rRNA gene sequencing facilitates population-scale 19 molecular epidemiological studies. Existing computational algorithms can parse 16S 20rRNA gene sequences to high-resolution Amplicon Sequence Variants (ASVs), which 21 represent ecologically coherent entities. Assigning species-level taxonomy to these ASVs 22is the critical remaining barrier to drawing ecologically/clinically relevant inferences from 23 and comparing data across 16S rRNA gene-based microbiota studies. 24Results: To overcome this barrier, we developed a broadly applicable method for 25 constructing a phylogeny-based, high-resolution, habitat-specific training set. When used 26 with the naĂŻve Bayesian Ribosomal Database Project (RDP) Classifier, this training set 27 achieved species/supraspecies-level taxonomic assignment to 16S rRNA gene-derived 28ASVs. The key steps for generating such a training set are 1) constructing an accurate 29 and comprehensive phylogenetic-based, habitat-specific database; 2) compiling multiple 30 16S rRNA gene sequences to represent the natural sequence variability of each taxon in 31 the database; 3) trimming the training set to match the sequenced regions if necessary; 32 and 4) placing species sharing closely related sequences into a supraspecies taxonomic 33 level to maintain subgenus resolution. As proof of principle, we developed a V1-V3 region 34 training set for the bacterial microbiota of the human aerodigestive tract using our 35 expanded Human Oral Microbiome Database (eHOMD). In addition, we overcame 36 technical limitations to successfully use Illumina sequences for the 16S rRNA gene V1-37 V3 region, the most informative segment for classifying bacteria native to the human 38 aerodigestive tract. We also generated a full-length eHOMD 16S rRNA gene training set, 39 which we used in conjunction with an independent PacBio Single Molecule, Real-Time 40 (SMRT)-sequenced sinonasal dataset to validate the representation of species in our 41 training set. The latter also established the effectiveness of a full-length training set for 42 assigning taxonomy of long-read 16S rRNA gene datasets. 43Conclusion: Here, we present a systematic approach for constructing a phylogeny-44 based, high-resolution, habitat-specific training set that permits species/supraspecies-45 level taxonomic assignment to short-and long-read 16S rRNA gene-derived ASVs. This 46 advancement enhances the ecological and/or clinical relevance of 16S rRNA gene-based 47 microbiota studies. 48In microbiota studies of most ecosystems and/or habitats, achieving ecologically and/or 53 clinically relevant results requires species-level identification of constituents. For 54 example, species-level identification is often critically important for host-associated 55 microbial communities because these often include commensal and pathogenic species 56 of the same genus, e.g., [1, 2]. Also, some microbial genera include species that are site 57 specialists and inhabit distinct niches of a given environment [3]. Although hig...