12High-throughput sequencing 16S rRNA gene surveys have enabled new insights into the 13 diversity of soil bacteria, and furthered understanding of the ecological drivers of abundances 14 across landscapes. However, current analytical approaches are of limited use in formalising 15 syntheses of the ecological attributes of taxa discovered, because derived taxonomic units are 16 typically unique to individual studies and sequence identification databases only characterise 17 taxonomy. To address this, we used sequences obtained from a large nationwide soil survey 18 (GB Countryside Survey, henceforth CS) to create a comprehensive soil specific 16S reference 19 database, with coupled ecological information derived from the survey metadata. Specifically, 20 we modelled taxon responses to soil pH at the OTU level using hierarchical logistic regression 21 (HOF) models, to provide information on putative landscape scale pH-abundance responses. 22 We identify that most of the soil OTUs examined exhibit predictable abundance responses 23 across soil pH gradients, though with the exception of known acidophilic lineages, the pH 24 optima of OTU relative abundance was variable and could not be generalised by broad 25 taxonomy. This highlights the need for tools and databases to predict ecological traits at finer 26 taxonomic resolution. We further demonstrate the utility of the database by testing against 27 geographically dispersed query 16S datasets; evaluating efficacy by quantifying matches, and 28 accuracy in predicting pH responses of query sequences from a separate large soil survey. We 29 found that the CS database provided good coverage of dominant taxa; and that the taxa 30 indicating soil pH in a query dataset corresponded with the pH classifications of top matches 31 in the CS database. Furthermore we were able to predict query dataset community structure, 32 using predicted abundances of dominant taxa based on query soil pH data and the HOF models 33 of matched CS database taxa. The database with associated HOF model outputs is released as 34 an online portal for querying single sequences of interest (https://shiny-apps.ceh.ac.uk/ID-35 TaxER/), and flat files are made available for use in bioinformatic pipelines. The further 36 development of advanced informatics infrastructures incorporating modelled ecological 37 attributes along with new functional genomic information will likely facilitate large scale 38 exploration and prediction of soil microbial functional biodiversity under current and future 39 environmental change scenarios.40 41 42 Soil bacteria are highly diverse 1, 2 and are significant contributors to soil functionality. 43 Sequencing of 16S rRNA genes has enabled a wealth of new insights into the taxonomic 44 diversity of soil prokaryotic communities, revealing the ecological controls on a vast diversity 45 of yet to be cultured taxa with unknown functional potential 3 . However, despite thousands of 46 studies across the globe, we are still some way from synthesising the new knowledge on the ...