21The oral cavity of each person is home for hundreds of bacterial species. While 22 taxa for oral diseases have been well studied using culture-based as well as 23 amplicon sequencing methods, metagenomic and genomic information remain 24 scarce compared to the fecal microbiome. Here we provide metagenomic shotgun 25 data for 3346 oral metagenomics samples, and together with 808 published 26 samples, assemble 56,213 metagenome-assembled genomes (MAGs). 64% of the 27 3,589 species-level genome bins contained no publicly available genomes, others 28 with only a handful. The resulting genome collection is representative of samples 29 around the world and across physiological conditions, contained many genomes 30 from Candidate phyla radiation (CPR) which lack monoculture, and enabled 31 discovery of new taxa such as a family within the Acholeplasmataceae order. 32 New biomarkers were identified for rheumatoid arthritis or colorectal cancer, 33 which would be more convenient than fecal samples. The large number of 34 metagenomic samples also allowed assembly of many strains from important 35 oral taxa such as Porphyromonas and Neisseria. Predicted functions enrich in 36 drug metabolism and small molecule synthesis. Thus, these data lay down a 37 genomic framework for future inquiries of the human oral microbiome. 38 39The human microbiome has been implicated in a growing number of diseases. 40The majority of microbial cells is believed to reside in the large intestine 1 and cohorts 41 with fecal metagenomic data contain over 1000 individuals 2, 3 . For the oral 42 microbiome, hundreds of metagenomic shotgun-sequenced samples have been 43 available from the Human Microbiome Project (HMP) and for rheumatoid arthritis 4-6 . 44A number of other diseases studied by Metagenome-wide association studies (MWAS) 45 using gut microbiome data also indicated potential contribution from the oral 46 microbiome in disease etiology 7-12 . Although the MWAS on rheumatoid arthritis was 47 the oral microbiome is believed to be well covered by culturing 13 , and analyses by 54 16S rRNA gene amplicon sequencing or polymerase chain reaction (PCR) are 55 common. Recently published large-scale metagenomic assembly efforts mostly 56 included fecal metagenomic data [14][15][16] . It is not clear how much is really missing for 57 the oral microbiome. The saliva, in particular, seems to have more bacterial species 58 per individual than the fecal microbiome 17 . 59After getting contigs using assembly algorithms suitable for metagenomic 60 data 18 , a central idea used by metagenomic binning algorithms is that genes or contigs 61 that co-vary in abundance among many samples belong to the same microbial 62 genome 8,[19][20][21] . Large cohorts are therefore prerequisites for high-quality assembly. 63Here we present 3346 new oral metagenomic samples, and 56,213 64 metagenome-assembled genomes (MAGs) which represent 3,589 species-level clades, 65revealing new taxa as well as substantially complementing the genomic content of 66 known species. This ...