Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms1. There are now nearly 1,000 completed bacterial and archaeal genomes available2, most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly biased phylogenetic distribution3–5. To explore the value added by choosing microbial genomes for sequencing on the basis of their evolutionary relationships, we have sequenced and analysed the genomes of 56 culturable species of Bacteria and Archaea selected to maximize phylogenetic coverage. Analysis of these genomes demonstrated pronounced benefits (compared to an equivalent set of genomes randomly selected from the existing database) in diverse areas including the reconstruction of phylogenetic history, the discovery of new protein families and biological properties, and the prediction of functions for known genes from other organisms. Our results strongly support the need for systematic ‘phylogenomic’ efforts to compile a phylogeny-driven ‘Genomic Encyclopedia of Bacteria and Archaea’ in order to derive maximum knowledge from existing microbial genome data as well as from genome sequences to come.
Brucella melitensis is a facultative intracellular bacterial pathogen that causes abortion in goats and sheep and Malta fever in humans. The genome of B. melitensis strain 16M was sequenced and found to contain 3,294,935 bp distributed over two circular chromosomes of 2,117,144 bp and 1,177,787 bp encoding 3,197 ORFs. By using the bioinformatics suite ERGO, 2,487 (78%) ORFs were assigned functions. The origins of replication of the two chromosomes are similar to those of other ␣-proteobacteria. Housekeeping genes, including those involved in DNA replication, transcription, translation, core metabolism, and cell wall biosynthesis, are distributed on both chromosomes. Type I, II, and III secretion systems are absent, but genes encoding sec-dependent, sec-independent, and flagellaspecific type III, type IV, and type V secretion systems as well as adhesins, invasins, and hemolysins were identified. Several features of the B. melitensis genome are similar to those of the symbiotic Sinorhizobium meliloti.
We present GenePRIMP (Gene Prediction IMprovement Pipeline, http://geneprimp.jgipsf.org), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes, and split genes. We show that manual curation of gene models using the anomaly reports generated by GenePRIMP improves their quality and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome sequencing and annotation technologies.More than 1000 microbial genomes have been completely sequenced to date 1 . The increasing number of sequencing projects driven by high-throughput sequencing technologies has further underscored the importance of computational methods in annotating and mining genomic data. For any genome, gene finding is the key step to understanding the biochemistry, physiology, and ecology of the organism. Gene finding relies heavily on computational methods and very few sequencing projects are complemented by the experimental verification of computationally predicted genes through functional genomics experiments or mapping of N-terminal sequences 2,3 . Together with multiple sequencing technologies, multiple gene finders, and somewhat imprecise standards for the identification of genes, this can result in different researchers arriving at substantially varying gene models for the same organism 4 (Fig. 1, Table 1). Consequently, higher standards of accuracy are required for computational gene prediction tools.The most popular gene finders are ab initio and work by statistically profiling protein coding, intergenic, and boundary regions using a variety of classifiers. While most ab initio gene callers boast an average accuracy of 90% or better [5][6][7] , accuracy can be compromised by many factors such as genomic islands of differing GC content, pseudogenes, and genes with programmed or artificial frameshifts, leading to sizeable variability between their gene model predictions. To improve gene models generated by ab initio predictions, some tools include heuristics and post-processing steps such as overlap removal, translation initiation site adjustment, and frameshift detection 8,9 , while others rely on the presence of sequenced close relatives 10 or experimental evidence 11,12 . However, many of these post-processing tools have been tested only on metazoan genomes and use criteria that are not applicable to prokaryotes, and/or are too slow or expensive to perform on a large number of microbial genomes.To overcome the aforesaid limitations of ab initio gene prediction methods, and to address the problem of large variation among their gene models, we have devised GenePRIMP; a computational evidence-based post-processing pipeline that identifies erroneously predicted genes. Manual correction of GenePRIMP-reported genes results in a standardized output gene complement for an organism (sequence) irrespective of the method used for initial gene predictions (Fig.
The complete genomic sequence of Pseudomonas syringae pv. syringae B728a (Pss B728a) has been determined and is compared with that of P. syringae pv. tomato DC3000 (Pst DC3000). The two pathovars of this economically important species of plant pathogenic bacteria differ in host range and other interactions with plants, with Pss having a more pronounced epiphytic stage of growth and higher abiotic stress tolerance and Pst DC3000 having a more pronounced apoplastic growth habitat. The Pss B728a genome (6.1 Mb) contains a circular chromosome and no plasmid, whereas the Pst DC3000 genome is 6.5 mbp in size, composed of a circular chromosome and two plasmids. Although a high degree of similarity exists between the two sequenced Pseudomonads, 976 protein-encoding genes are unique to Pss B728a when compared with Pst DC3000, including large genomic islands likely to contribute to virulence and host specificity. Over 375 repetitive extragenic palindromic sequences unique to Pss B728a when compared with Pst DC3000 are widely distributed throughout the chromosome except in 14 genomic islands, which generally had lower GC content than the genome as a whole. Content of the genomic islands varies, with one containing a prophage and another the plasmid pKLC102 of Pseudomonas aeruginosa PAO1. Among the 976 genes of Pss B728a with no counterpart in Pst DC3000 are those encoding for syringopeptin, syringomycin, indole acetic acid biosynthesis, arginine degradation, and production of ice nuclei. The genomic comparison suggests that several unique genes for Pss B728a such as ectoine synthase, DNA repair, and antibiotic production may contribute to the epiphytic fitness and stress tolerance of this organism.virulence genes ͉ epiphyte ͉ plant pathogen
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.