BackgroundThe opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms. These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner.ResultsThe Bacterial Isolate Genome Sequence Database (BIGSDB) is a scalable, open source, web-accessible database system that meets these needs, enabling phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked for a limitless number of bacterial specimens. The system builds on the widely used mlstdbNet software, developed for the storage and distribution of multilocus sequence typing (MLST) data, and incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences. These loci can be further organised into 'schemes' for isolate characterisation or for evolutionary or functional analyses. Isolates and loci can be indexed by multiple names and any number of alternative schemes can be accommodated, enabling cross-referencing of different studies and approaches. LIMS functionality of the software enables linkage to and organisation of laboratory samples. The data are easily linked to external databases and fine-grained authentication of access permits multiple users to participate in community annotation by setting up or contributing to different schemes within the database. Some of the applications of BIGSDB are illustrated with the genera Neisseria and Streptococcus.The BIGSDB source code and documentation are available at http://pubmlst.org/software/database/bigsdb/.ConclusionsGenomic data can be used to characterise bacterial isolates in many different ways but it can also be efficiently exploited for evolutionary or functional studies. BIGSDB represents a freely available resource that will assist the broader community in the elucidation of the structure and function of bacteria by means of a population genomics approach.
The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes. Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications. The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation. Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes. There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question. In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines.
The eubacterial genus Wolbachia comprises one of the most abundant groups of obligate intracellular bacteria, and it has a host range that spans the phyla Arthropoda and Nematoda. Here we developed a multilocus sequence typing (MLST) scheme as a universal genotyping tool for Wolbachia. Internal fragments of five ubiquitous genes (gatB, coxA, hcpA, fbpA, and ftsZ) were chosen, and primers that amplified across the major Wolbachia supergroups found in arthropods, as well as other divergent lineages, were designed. A supplemental typing system using the hypervariable regions of the Wolbachia surface protein (WSP) was also developed. Thirty-seven strains belonging to supergroups A, B, D, and F obtained from singly infected hosts were characterized by using MLST and WSP. The number of alleles per MLST locus ranged from 25 to 31, and the average levels of genetic diversity among alleles were 6.5% to 9.2%. A total of 35 unique allelic profiles were found. The results confirmed that there is a high level of recombination in chromosomal genes. MLST was shown to be effective for detecting diversity among strains within a single host species, as well as for identifying closely related strains found in different arthropod hosts. Identical or similar allelic profiles were obtained for strains harbored by different insect species and causing distinct reproductive phenotypes. Strains with similar WSP sequences can have very different MLST allelic profiles and vice versa, indicating the importance of the MLST approach for strain identification. The MLST system provides a universal and unambiguous tool for strain typing, population genetics, and molecular evolutionary studies. The central database for storing and organizing Wolbachia bacterial and host information can be accessed at http://pubmlst.org/wolbachia/.
Multilocus sequence typing (MLST) was proposed in 1998 as a portable sequence-based method for identifying clonal relationships among bacteria. Today, in the whole-genome era of microbiology, the need for systematic, standardized descriptions of bacterial genotypic variation remains a priority. Here, to meet this need, we draw on the successes of MLST and 16S rRNA gene sequencing to propose a hierarchical gene-by-gene approach that reflects functional and evolutionary relationships and catalogues bacteria 'from domain to strain'. Our gene-based typing approach using online platforms such as the Bacterial Isolate Genome Sequence Database (BIGSdb) allows the scalable organization and analysis of whole-genome sequence data.Advances in nucleotide-sequencing technology have provided unparalleled access to the enormous genetic diversity that has accumulated in the bacterial domain during 3.5-4 billion years of evolution 1 . Numerous sets of whole-genome sequencing (WGS) data for bacterial isolates (BOX 1) are available 2 , and metagenomic studies using these technologies continue to reveal further, seemingly boundless, diversity in bacterial communities 3 . Faced with this plethora of information, microbiologists must develop structured means of describing this diversity and of linking phenotype and genotype, thereby facilitating an improved understanding of the microbiological world. Given that we have precise information on the function of only a very small proportion of bacterial genes, and no knowledge at all about most, this is a formidable, if extremely exciting, challenge.Here, we focus primarily on pathogenic bacteria, although the concepts discussed are applicable more widely to all bacteria and archaea. Bacterial pathogens played a crucial part in the development of experimental microbiology and remain the most intensively studied prokaryotes more than 100 years later 4 . Pathogens have emerged across the diversity of the bacterial -but, interestingly, not the archaeal -domain on many occasions and are both polyphyletic and highly diverse. Thus, although pathogens represent only a tiny subset of the bacterial world, the challenges faced by the clinical microbiology laboratory are representative of those faced by microbiology as a whole.Taxonomic and functional analyses are based on the observations that diversity among bacteria is not continuous and that distinct, stable types with particular properties exist 5 . These founding principles of microbiology 6 have been upheld by much subsequent research, but the study of such clusters remains largely descriptive, and the evolutionary mechanisms that led to cluster emergence and persistence remain incompletely understood 7,8 . Structuring is also evident within bacterial genomes, as diversity is unevenly distributed among genes Pre-WGS cataloguing of diversityA major advance in defining bacterial diversity was the proposal, by the late Carl Woese and colleagues, of a universal and 'natural' -that is, genealogical -classification system based on small-su...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.