Species delineation based on bacterial genomes is an essential part of the research of prokaryotes. In silico genome-to-genome comparison methods are computationally demanding, but much less tedious and error prone than the wet-lab methods. In this paper, we present a novel method for the delineation of bacterial genomes based on genomic signal processing. The proposed method uses numerical representations of whole bacterial genomes, phase signal and cumulated phase signal, from which four parameters are derived for each genome. The parameters characterize a genome and their calculation is independent of the other genomes comprising a delineation dataset. The delineation itself is processed as a calculation of the parameters' average similarity. The method was statistically verified on 1826 bacterial genomes. A similarity threshold of 96% was set based on the receiver operating characteristic curve that featured sensitivity of 99.78% and specificity of 97.25%. Additionally, comparative analysis on another 33 bacterial genomes was conducted using standard delineation tools as these tools were not able to process the dataset of 1826 genomes using desktop computer. The proposed method achieved comparable or better delineation results in comparison with the standard tools. Besides the excellent delineation results, another great advantage of the method is its small computational demands, which enables the delineation of thousands of genomes on a desktop computer. The calculation of the parameters takes tens of minutes for thousands of genomes. Moreover, they can be calculated in advance by creating a database, meaning the delineation itself is then completed in a matter of seconds.
Background Pathogenic treponemes related to Treponema pallidum are both human (causing syphilis, yaws, bejel) and animal pathogens (infections of primates, venereal spirochetosis in rabbits). A set of 11 treponemal genome sequences including those of five Treponema pallidum ssp. pallidum (TPA) strains (Nichols, DAL-1, Mexico A, SS14, Chicago), four T . p . ssp. pertenue (TPE) strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), one T . p . ssp. endemicum (TEN) strain (Bosnia A) and one strain (Cuniculi A) of Treponema paraluisleporidarum ecovar Cuniculus (TPeC) were tested for the presence of positively selected genes. Methodology/Principal findings A total of 1068 orthologous genes annotated in all 11 genomes were tested for the presence of positively selected genes using both site and branch-site models with CODEML (PAML package). Subsequent analyses with sequences obtained from 62 treponemal draft genomes were used for the identification of positively selected amino acid positions. Synthetic biotinylated peptides were designed to cover positively selected protein regions and these peptides were tested for reactivity with the patient's syphilis sera. Altogether, 22 positively selected genes were identified in the TP genomes and TPA sets of positively selected genes differed from TPE genes. While genetic variability among TPA strains was predominantly present in a number of genetic loci, genetic variability within TPE and TEN strains was distributed more equally along the chromosome. Several syphilitic sera were shown to react with some peptides derived from the protein sequences evolving under positive selection. Conclusions/Significance The syphilis-, yaws-, and bejel-causing strains differed relative to sets of positively selected genes. Most of the positively selected chromosomal loci were identified among the TPA treponemes. The local accumulation of genetic variability suggests that the diversification of TPA strains took place predominantly in a limited number of genomic regions compared to the more dispersed genetic diversity differentiating TPE and TEN strains. The identification of positively selected sites in tpr genes and genes encoding outer membrane proteins suggests their role during infection of human and animal hosts. The driving force for adaptive evolution at these loci thus appears to be the host immune response as supported by observed reactivity of syphilitic sera with some peptides derived from protein sequences showing adaptive evolution.
Bioinformatics may seem to be a scientific field processing primarily large string datasets, as nucleotides and amino acids are represented with dedicated characters. On the other hand, many computational tasks that bioinformatics challenges are mathematical problems understandable as operations with digits. In fact, many computational tasks are solved this way in the background. One of the most widely used digital representations is mapping of nucleotides and amino acids with integers 0–3 and 0–20, respectively. The limitation of this mapping occurs when the digital signal of nucleotides has to be translated into a digital signal of amino acids as the genetic code is degenerated. This causes non-monotonies in a mapping function. Although map for reducing this undesirable effect has already been proposed, it is defined theoretically and for standard genetic codes only. In this study, we derived a novel optimal criterion for reducing the influence of degeneration by utilizing a large dataset of real sequences with various genetic codes. As a result, we proposed a new robust global optimal map suitable for any genetic code as well as specialized optimal maps for particular genetic codes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.