10Accurate species-level taxonomic classification and profiling of complex microbial communities 11 remains a challenge due to homologous regions shared among closely related species and a 12 sparse representation of non-human associated microbes in the database. Although the database 13 undoubtedly has a strong influence on the sensitivity of taxonomic classifiers and profilers, to 14 date, no study has carefully explored this topic on historical RefSeq releases and explored its 15 impact on accuracy. In this study, we examined the influence of the database, over time, on k-16 mer based sequence classification and profiling. We present three major findings: (i) database 17 growth over time resulted in more classified reads, but fewer species-level classifications and 18 more species-level misclassifications; (ii) Bayesian re-estimation of abundance helped to recover 19 species-level classifications when the exact target strain was present; and (iii) Bayesian re-20 estimation struggled when the database lacked the target strain, resulting in a notable decrease in 21 accuracy. In summary, our findings suggest that the growth of RefSeq over time has strongly 22 influenced the accuracy of k-mer based classification and profiling methods, resulting in 23 . CC-BY 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/304972 doi: bioRxiv preprint first posted online Apr. 19, 2018; 2 different classification results depending on the particular database used. These results suggest a 24 need for new algorithms specially adapted for large genome collections and better measures of 25 classification uncertainty. 26 27