Plasmid prediction may be of great interest when studying bacteria such as Enterobacteriaceae. Indeed many resistance and virulence genes are located on such replicons and can have major impact in terms of pathogenicity and spreading capacities.Beyond strains outbreak, plasmids outbreaks have been reported especially for some extended-spectrum beta-lactamase or carbapenemase producing Enterobacteriaceae.Several tools are now available to explore the "plasmidome" from whole-genome sequence data, with many interesting and various approaches. However recent benchmarks have highlighted that none of them succeed to combine high sensitivity and specificity. With this in mind we developed PlaScope, a targeted approach to recover plasmidic sequences in Escherichia coli. Based on Centrifuge, a metagenomic classifier, and a custom database containing complete sequences of chromosomes and plasmids from various curated databases, it performs a classification of contigs from an assembly according to their predicted location. Compared to other plasmid classifiers, Plasflow and cBar, it achieves better recall (0.87), specificity (0.99), precision (0.96) and accuracy (0.98) on a dataset of 70 genomes containing plasmids. Finally we tested our method on a dataset of E. coli strains exhibiting an elevated rate of extended-spectrum beta-lactamase coding gene chromosomal integration, and we were able to identify 20/21 of these events. Moreover virulence genes and operons predicted locations were also in agreement with the literature. Similar approaches could also be developed for other well-characterized bacteria such as Klebsiella pneumoniae.
Data summary1. All the genomes were downloaded from the National Center for Biotechnology Information Sequence Read Archive and Genome database (Supplementary table 1 and 2).
The source code of PlaScope is available on Github(https://github.com/GuilhemRoyer/PlaScope).
ImportancePlasmid exploration could be of great interest since these replicons are pivotal in the adaptation of bacteria to their environment. They are involved in the exchange of many genes within and between species, with a significant impact on antibiotic resistance and virulence in particular. However, plasmid characterization has been a laborious task for many years, requiring complex conjugation or electroporation manipulations for example.With the advent of whole genome sequencing techniques, access to these sequences is now potentially easier provided that appropriate tools are available. Many softwares have been developed to explore the plasmidome of a large variety of bacteria, but they rarely managed to combine sensitivity and specificity. Here, we focus on a single species, E. coli, and we use the many data available to overcome this problem. With our tool called PlaScope, we achieve high performance compared with two other classifiers, Plasflow and cBar, and we demonstrate the utility of such an approach to determine the location of virulence or resistance genes. We think that PlaScope could be very useful in the ana...