BackgroundNon-toxigenic Corynebacterium diphtheriae strains are emerging as a major cause of severe pharyngitis and tonsillitis as well as invasive diseases such as endocarditis, septic arthritis, splenic abscesses and osteomyelitis. C. diphtheriae strains have been reported to vary in their ability to adhere and invade different cell lines. To identify the genetic basis of variation in the degrees of pathogenicity, we sequenced the genomes of four strains of C. diphtheriae (ISS 3319, ISS 4060, ISS 4746 and ISS 4749) that are well characterised in terms of their ability to adhere and invade mammalian cells.ResultsComparative analyses of 20 C. diphtheriae genome sequences, including 16 publicly available genomes, revealed a pan-genome comprising 3,989 protein coding sequences that include 1,625 core genes and 2,364 accessory genes. Most of the genomic variation between these strains relates to uncharacterised genes encoding hypothetical proteins or transposases. Further analyses of protein sequences using an array of bioinformatic tools predicted most of the accessory proteome to be located in the cytoplasm. The membrane-associated and secreted proteins are generally involved in adhesion and virulence characteristics. The genes encoding membrane-associated proteins, especially the number and organisation of the pilus gene clusters (spa) including the number of genes encoding surface proteins with LPXTG motifs differed between different strains. Other variations were among the genes encoding extracellular proteins, especially substrate binding proteins of different functional classes of ABC transport systems and ‘non-classical’ secreted proteins.ConclusionsThe structure and organisation of the spa gene clusters correlates with differences in the ability of C. diphtheriae strains to adhere and invade the host cells. Furthermore, differences in the number of genes encoding membrane-associated proteins, e.g., additional proteins with LPXTG motifs could also result in variation in the adhesive properties between different strains. The variation in the secreted proteome may be associated with the degree of pathogenesis. While the role of the ‘non-classical’ secretome in virulence remains unclear, differences in the substrate binding proteins of various ABC transport systems and cytoplasmic proteins potentially suggest strain variation in nutritional requirements or a differential ability to utilize various carbon sources.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1980-8) contains supplementary material, which is available to authorised users.