BackgroundHuman papillomavirus (HPV) is the carcinogen of almost all invasive cervical cancer and a major cause of oral and other anogenital malignancies. HPV genotyping by dideoxy (Sanger) sequencing is currently the reference method of choice for clinical diagnostics. However, for samples with multiple HPV infections, genotype identification is singular and occasionally imprecise or indeterminable due to overlapping chromatograms. Our aim was to explore and compare HPV metagenomes in abnormal cervical cytology by deep sequencing for correlation with disease states.ResultsLow- and high-grade intraepithelial lesion (LSIL and HSIL) cytology samples were DNA extracted for PCR-amplification of the HPV E6/E7 genes. HPV+ samples were sequenced by dideoxy and deep methods. Deep sequencing revealed ~60% of all samples (n = 72) were multi-HPV infected. Among LSIL samples (n = 43), 27 different genotypes were found. The 3 dominant (most abundant) genotypes were: HPV-39, 11/43 (26%); -16, 9/43 (21%); and -35, 4/43 (9%). Among HSIL (n = 29), 17 HPV genotypes were identified; the 3 dominant genotypes were: HPV-16, 21/29 (72%); -35, 4/29 (14%); and -39, 3/29 (10%). Phylogenetically, type-specific E6/E7 genetic distances correlated with carcinogenic potential. Species diversity analysis between LSIL and HSIL revealed loss of HPV diversity and domination by HPV-16 in HSIL samples.ConclusionsDeep sequencing resolves HPV genotype composition within multi-infected cervical cytology. Biodiversity analysis reveals loss of diversity and gain of dominance by carcinogenic genotypes in high-grade cytology. Metagenomic profiles may therefore serve as a biomarker of disease severity and a population surveillance tool for emerging genotypes.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-017-3612-y) contains supplementary material, which is available to authorized users.