BackGround Bacterial genomes are being deposited into online databases at an increasing rate. Genome annotation represents one of the first efforts to understand organisms and their diseases. Some evolutionary relationships capable of being annotated only from genomes are conserved gene neighbourhoods (CNs), phylogenetic profiles (PPs), and gene fusions. At present, there is no standalone software that enables networks of interactions among proteins to be created using these three evolutionary characteristics with efficient and effective results. Results We developed GENPPI software for the ab initio prediction of interaction networks using predicted proteins from a genome. In our case study, we employed 50 genomes of the genus Corynebacterium. Based on the PP relationship, GENPPI differentiated genomes between the ovis and equi biovars of the species Corynebacterium pseudotuberculosis and created groups among the other species analysed. If we inspected only the CN relationship, we could not entirely separate biovars, only species. Our software GENPPI was determined to be efficient because, for example, it creates interaction networks from the central genomes of 50 species/lineages with an average size of 2200 genes in less than 40 min on a conventional computer. Moreover, the interaction networks that our software creates reflect correct evolutionary relationships between species, which we confirmed with average nucleotide identity analyses. Additionally, this software enables the user to define how he or she intends to explore the PP and CN characteristics through various parameters, enabling the creation of customized interaction networks. For instance, users can set parameters regarding the genus, metagenome, or pangenome. In addition to the parameterization of GENPPI, it is also the user’s choice regarding which set of genomes they are going to study. Conclusions GENPPI can help fill the gap concerning the considerable number of novel genomes assembled monthly and our ability to process interaction networks considering the noncore genes for all completed genome versions. With GENPPI, a user dictates how many and how evolutionarily correlated the genomes answer a scientific query.
Aims Develop a species‐specific multiplex PCR to correctly identify Edwardsiella species in routine diagnostic for fish bacterial diseases. Methods and Results The genomes of 62 Edwardsiella spp. isolates available from the National Center for Biotechnology Information (NCBI) database were subjected to taxonomic and pan‐genomic analyses to identify unique regions that could be exploited by species‐specific PCR. The designed primers were tested against isolated Edwardsiella spp. strains, revealing errors in commercial biochemical tests for bacterial classification regarding Edwardsiella species. Conclusion Some of the genomes of Edwardsiella spp. in the NCBI platform were incorrectly classified, which can lead to errors in some research. A functional mPCR was developed to differentiate between phenotypically and genetically ambiguous Edwardsiella, with which, we detected the presence of Edwardsiella anguillarum affecting fish in Brazil. Significance and Impact of the Study This study shows that the misclassification of Edwardsiella spp in Brazil concealed the presence of E. anguillarum in South America. Also, this review of the taxonomic classification of the Edwardsiella genus is a contribution to the field to help researchers with their sequencing and identification of genomes, showing some misclassifications in online databases that must be corrected, as well as developing an easy assay to characterize Edwardsiella species in an end‐point mPCR.
Understanding protein secretion pathways are of paramount importance in studying diseases caused by bacteria and their respective treatments. Most such paths must signal ways to identify secretion. However, some proteins, known as non-classical secreted proteins, do not have signaling ways. This study aims to classify such proteins from predictive machine-learning techniques. Guided by the literature, we collected a set of physical-chemical characteristics of amino acids from the AA index site bolding know protein motifs, like hydrophobicity. In this work, we developed a six steps method (Alignment, Preliminary classification, mean outliers, two Clustering algorithms, and Random choice) to filter data from raw genomes and compose a negative dataset in contrast to a positive dataset of 141 proteins also gathered from the literature. Using a conventional Random Forest machine-learning algorithm, we obtained an accuracy of ~91% on classifying non-classical secreted proteins in a validation dataset with 14 positive and 92 negatives proteins - sensitivity and specificity of 91 and ~86%, respectively, performance compared to state of the art for non-classical secretion classification, but a less sophisticated algorithm allows us to classify bacterial proteins concerning secretion by non-classical pathways more rapidly. Therefore, this research has shown that selecting an appropriate descriptors' set and an expressive training dataset compensates for not using an advanced machine learning algorithm for the secretion by non-classical pathways purpose. The data and software from this work, available at https://github.com/santosardr/non-CSPs, can be downloaded for standalone use without needing third-party software.
MotivationBacterial genomes are being deposited into online databases at an increasing rate. Genome annotation represents one of the first efforts to understand organisms and their diseases. Some evolutionary relationships that are capable of being annotated only from genomes are conserved gene neighbourhoods (CNs), phylogenetic profiles (PPs), and gene fusions. At present, there is no standalone software that enables networks of interactions among proteins to be created using these three evolutionary characteristics with efficient and effective results.ResultsWe developed GENPPI software for the ab initio prediction of interaction networks using predicted proteins from a genome. In our case study, we employed 50 genomes of the genus Corynebacterium. Based on the PP relationship, GENPPI differentiated genomes between the ovis and equi biovars of the species Corynebacterium pseudotuberculosis and created groups among the other species analysed. If we inspected only the CN relationship, we could not entirely separate biovars, only species. Our software GENPPI was determined to be efficient because, for example, it creates interaction networks from the central genomes of 50 species/lineages with an average size of 2200 genes in less than 40 minutes on a conventional computer. Our software is compelling because the interaction networks that it creates reflect evolutionary relationships among species and were obtained in average nucleotide identity (ANI) analyses. Additionally, this software enables the user to define how he or she intends to explore the PP and CN characteristics through various parameters, enabling the creation of customized interaction networks. For instance, users can set parameters regarding the genus, metagenome, or pangenome. In addition to the parameterization of GENPPI, it is also the user’s choice regarding which set of genomes he or she is going to study.AvailabilityThe source code in the Common Lisp language, binary files for different operating systems, and GENPPI software tutorials are available at {{github.com/santosardr/genppi}}.Contactsantosardr@ufu.brSupplementary informationSupplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.