The isolation of microorganisms from microbial community samples often yields a large number of conspecific isolates. Increasing the diversity covered by an isolate collection entails the implementation of methods and protocols to minimize the number of redundant isolates. Matrix-assisted laser desorption–ionization time-of-flight (MALDI-TOF) mass spectrometry methods are ideally suited to this dereplication problem because of their low cost and high throughput. However, the available software tools are cumbersome and rely either on the prior development of reference databases or on global similarity analyses, which are inconvenient and offer low taxonomic resolution. We introduce SPeDE, a user-friendly spectral data analysis tool for the dereplication of MALDI-TOF mass spectra. Rather than relying on global similarity approaches to classify spectra, SPeDE determines the number of unique spectral features by a mix of global and local peak comparisons. This approach allows the identification of a set of nonredundant spectra linked to operational isolation units. We evaluated SPeDE on a data set of 5,228 spectra representing 167 bacterial strains belonging to 132 genera across six phyla and on a data set of 312 spectra of 78 strains measured before and after lyophilization and subculturing. SPeDE was able to dereplicate with high efficiency by identifying redundant spectra while retrieving reference spectra for all strains in a sample. SPeDE can identify distinguishing features between spectra, and its performance exceeds that of established methods in speed and precision. SPeDE is open source under the MIT license and is available from https://github.com/LM-UGent/SPeDE.
IMPORTANCE Estimation of the operational isolation units present in a MALDI-TOF mass spectral data set involves an essential dereplication step to identify redundant spectra in a rapid manner and without sacrificing biological resolution. We describe SPeDE, a new algorithm which facilitates culture-dependent clinical or environmental studies. SPeDE enables the rapid analysis and dereplication of isolates, a critical feature when long-term storage of cultures is limited or not feasible. We show that SPeDE can efficiently identify sets of similar spectra at the level of the species or strain, exceeding the taxonomic resolution of other methods. The high-throughput capacity, speed, and low cost of MALDI-TOF mass spectrometry and SPeDE dereplication over traditional gene marker-based sequencing approaches should facilitate adoption of the culturomics approach to bacterial isolation campaigns.
We performed a taxonomic and comparative genomics analysis of 67 novel Paraburkholderia isolates from forest soil. Phylogenetic analysis of the recA gene revealed that these isolates formed a coherent lineage within the genus Paraburkholderia that also included Paraburkholderiaaspalathi, Paraburkholderiamadseniana, Paraburkholderiasediminicola, Paraburkholderiacaffeinilytica, Paraburkholderiasolitsugae and Paraburkholderiaelongata and four unidentified soil isolates from earlier studies. A phylogenomic analysis, along with orthoANIu and digital DNA–DNA hybridization calculations revealed that they represented four different species including three novel species and P. aspalathi. Functional genome annotation of the strains revealed several pathways for aromatic compound degradation and the presence of mono- and dioxygenases involved in the degradation of the lignin-derived compounds ferulic acid and p-coumaric acid. This co-occurrence of multiple Paraburkholderia strains and species with the capacity to degrade aromatic compounds in pristine forest soil is likely caused by the abundant presence of aromatic compounds in decomposing plant litter and may highlight a diversity in micro-habitats or be indicative of synergistic relationships. We propose to classify the isolates representing novel species as Paraburkholderia domus with LMG 31832T (=CECT 30334) as the type strain, Paraburkholderia nemoris with LMG 31836T (=CECT 30335) as the type strain and Paraburkholderia haematera with LMG 31837T (=CECT 30336) as the type strain and provide an emended description of Paraburkholderia sediminicola Lim et al. 2008.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.