8Background: Despite recent decreases in the cost of sequencing, shotgun metagenome 9 sequencing remains more expensive compared with 16S rRNA amplicon sequencing. 10Methods have been developed to predict the functional profiles of microbial communities 11 based on their taxonomic composition, and PICRUSt is the most widely used of these 12 techniques. In this study, we evaluated the performance of PICRUSt by comparing the 13 significance of the differential abundance of functional gene profiles predicted with 14 PICRUSt to those from shotgun metagenome sequencing across different environments. 15Results: We selected 7 datasets of human, non-human animal and environmental (soil) 16 samples that have publicly available 16S rRNA and shotgun metagenome sequences. As 17 we would expect based on previous literature, strong Spearman correlations were 18 observed between gene compositions predicted with PICRUSt and measured with 19 shotgun metagenome sequencing. However, these strong correlations were preserved 20 even when the sample labels were shuffled. This suggests that simple correlation 21 coefficient is a highly unreliable measure for the performance of algorithms like 22PICRUSt. As an alternative, we compared the performance of PICRUSt predicted genes 23 2 to metagenome genes in inference models associated with metadata within each dataset. 24With this method, we found reasonable performance for human datasets, with PICRUSt 25 performing better for inference on genes related to "house-keeping" functions. However, 26 the performance of PICRUSt degraded sharply outside of human datasets when used for 27 inference. 28 Conclusion: We conclude that the utility of PICRUSt for inference with the default 29 database is likely limited outside of human samples and that development of tools for 30 gene prediction specific to different non-human and environmental samples is warranted. 31 32 Key words: microbiota functional profile prediction, inference, sample type, functional 33 category 34 35 necessary in order to ensure adequate statistical power for detecting true differences [1]. 47 Additionally, metagenome sequencing can also be very challenging for low biomass 48 samples or samples that are dominated by non-microbial DNA [2, 3]. 49 50 In order to address this problem, tools have been developed to predict microbial 51 functional genes from their taxonomic compositions inferred from more cost-effective 52 amplicon sequencing, including PICRUSt, Tax4Fun and FaproTax [4-6]. Among these 53 tools, PICRUSt is the most widely used and has been applied in hundreds of projects on 54 various environments, including human gut [7, 8], murine [9, 10], fish [11], coral [12], 55 water [13], plant [14], bioreactor [15] and soil [16]. PICRUSt predicts the genes of 56 organisms without sequenced genomes based on mapping their 16S rRNA genes to 57 homologous taxa with fully sequenced genomes. The predictions of PICRUSt are 58 therefore limited by currently available genomes, which are highly biased towards 59 microorganisms associated ...