How genomic diversity within bacterial populations originates and is maintained in the presence of frequent recombination is a central problem in understanding bacterial evolution. Natural populations of Borrelia burgdorferi, the bacterial agent of Lyme disease, consist of diverse genomic groups co-infecting single individual vertebrate hosts and tick vectors. To understand mechanisms of sympatric genome differentiation in B. burgdorferi, we sequenced and compared 23 genomes representing major genomic groups in North America and Europe. Linkage analysis of .13,500 single-nucleotide polymorphisms revealed pervasive horizontal DNA exchanges. Although three times more frequent than point mutation, recombination is localized and weakly affects genome-wide linkage disequilibrium. We show by computer simulations that, while enhancing population fitness, recombination constrains neutral and adaptive divergence among sympatric genomes through periodic selective sweeps. In contrast, simulations of frequency-dependent selection with recombination produced the observed pattern of a large number of sympatric genomic groups associated with major sequence variations at the selected locus. We conclude that negative frequency-dependent selection targeting a small number of surface-antigen loci (ospC in particular) sufficiently explains the maintenance of sympatric genome diversity in B. burgdorferi without adaptive divergence. We suggest that pervasive recombination makes it less likely for local B. burgdorferi genomic groups to achieve host specialization. B. burgdorferi genomic groups in the northeastern United States are thus best viewed as constituting a single bacterial species, whose generalist nature is a key to its rapid spread and human virulence. G ENETIC discontinuity, the basis of biodiversity, is ubiquitous in prokaryotes as well as in eukaryotes. Most bacterial populations display a highly clonal genetic structure, in which the observable number of multilocus genotypes is far fewer than the number expected under the assumption of free recombination (Maynard Smith et al. 1993). Bacterial clonality was originally thought of as a result of a lack or rarity of recombination among asexually reproducing and independently evolving clones (Ochman and Selander 1984). Since then, molecular surveys of natural bacterial populations using protein electrophoresis, multilocus sequencing typing (MLST), and whole-genome PRJNA3, PRJNA28633, PRJNA19839, PRJNA29359, PRJNA28629, PRJNA29357, PRJNA21003, PRJNA19835, PRJNA28627, PRJNA21001, PRJNA29361, PRJNA28621, PRJNA19837, PRJNA20999, PRJNA28631, PRJNA29363, PRJNA17057, PRJNA19841, PRJNA12554, PRJNA28625, PRJNA29573, PRJNA19843, and PRJNA28635. 1 Present address: Odum School of Ecology, University of Georgia, Athens, GA 30602. sequencing revealed that horizontal genetic exchange is in fact often more frequent than point mutations in bacteria, including species known as strongly clonal (Maynard Smith et al. 1993;Feil and Spratt 2001;Didelot and Maiden 2010;Retc...
Three billion years of evolution have produced a tremendous diversity of protein molecules, and yet the full potential of this molecular class is likely far greater. Accessing this potential has been challenging for computation and experiments because the space of possible protein molecules is much larger than the space of those likely to host function. Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences and that can be conditioned to steer the generative process towards desired properties and functions. To enable this, we introduce a diffusion process that respects the conformational statistics of polymer ensembles, an efficient neural architecture for molecular systems based on random graph neural networks that enables long-range reasoning with sub-quadratic scaling, equivariant layers for efficiently synthesizing 3D structures of proteins from predicted inter-residue geometries, and a general low-temperature sampling algorithm for diffusion models. We suggest that Chroma can effectively realize protein design as Bayesian inference under external constraints, which can involve symmetries, substructure, shape, semantics, and even natural language prompts. With this unified approach, we hope to accelerate the prospect of programming protein matter for human health, materials science, and synthetic biology.
Many applications in protein engineering require optimizing multiple protein properties simultaneously, such as binding one target but not others or binding a target while maintaining stability. Such multistate design problems require navigating a high-dimensional space to find proteins with desired characteristics. A model that relates protein sequence to functional attributes can guide design to solutions that would be hard to discover via screening. In this work, we measured thousands of protein–peptide binding affinities with the high-throughput interaction assay amped SORTCERY and used the data to parameterize a model of the alpha-helical peptide-binding landscape for three members of the Bcl-2 family of proteins: Bcl-xL, Mcl-1, and Bfl-1. We applied optimization protocols to explore extremes in this landscape to discover peptides with desired interaction profiles. Computational design generated 36 peptides, all of which bound with high affinity and specificity to just one of Bcl-xL, Mcl-1, or Bfl-1, as intended. We designed additional peptides that bound selectively to two out of three of these proteins. The designed peptides were dissimilar to known Bcl-2–binding peptides, and high-resolution crystal structures confirmed that they engaged their targets as expected. Excellent results on this challenging problem demonstrate the power of a landscape modeling approach, and the designed peptides have potential uses as diagnostic tools or cancer therapeutics.
Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of ∼20,000 genes and ∼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.