Summary Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the predicted BGCs revealed large gene cluster families, the vast majority uncharacterized. We experimentally characterized the most prominent family, consisting of two subfamilies of hundreds of BGCs distributed throughout the Proteobacteria; their products are aryl polyenes, lipids with an aryl head group conjugated to a polyene tail. We identified a distant relationship to a third subfamily of aryl polyene BGCs, and together the three subfamilies represent the largest known family of biosynthetic gene clusters, with more than 1,000 members. Although these clusters are widely divergent in sequence, their small molecule products are remarkably conserved, indicating for the first time the important roles these compounds play in Gram-negative cell biology.
SUMMARY In complex biological systems, small molecules often mediate microbe-microbe and microbe-host interactions. Using a systematic approach, we identified 3,118 small molecule biosynthetic gene clusters (BGCs) in genomes of human-associated bacteria and studied their representation in 752 metagenomic samples from the NIH Human Microbiome Project. Remarkably, we discovered that BGCs for a class of antibiotics in clinical trials, thiopeptides, are widely distributed in genomes and metagenomes of the human microbiota. We purified and solved the structure of a new thiopeptide antibiotic, lactocillin, from a prominent member of the vaginal microbiota. We demonstrate that lactocillin has potent antibacterial activity against a range of Gram-positive vaginal pathogens, and we show that lactocillin and other thiopeptide BGCs are expressed in vivo by analyzing human metatranscriptomic sequencing data. Our findings illustrate the widespread distribution of small-molecule-encoding BGCs in the human microbiome, and they demonstrate the bacterial production of drug-like molecules in humans.
The gut microbiota synthesize hundreds of molecules, many of which are known to impact host physiology. Among the most abundant metabolites are the secondary bile acids deoxycholic acid (DCA) and lithocholic acid (LCA), which accumulate at ~500 µM and are known to block C. difficile growth 1 , promote hepatocellular carcinoma 2 , and modulate host metabolism via the GPCR TGR5 3 . More broadly, DCA, LCA and their derivatives are a major component of the recirculating bile acid pool 4 ; the size and composition of this pool are a target of therapies for primary biliary cholangitis and nonalcoholic steatohepatitis. Despite the clear impact of DCA and LCA on host physiology, incomplete knowledge of their biosynthetic genes and a lack of genetic tools in their native producer limit our ability to modulate secondary bile acid levels in the host. Here, we complete the pathway to DCA/LCA by assigning and characterizing enzymes for each of the steps in its reductive arm, revealing a strategy in which the A-B rings of the steroid core are transiently converted into an electron acceptor for two reductive steps carried out by Fe-S flavoenzymes. Using anaerobic in vitro reconstitution, we establish that a set of six enzymes is necessary and sufficient for the 8-step conversion of cholic acid to DCA. We then engineer the pathway into Clostridium sporogenes, conferring production of DCA and LCA on a non-producing commensal and demonstrating that a microbiome-derived pathway can be expressed and controlled heterologously. These data establish a complete pathway to two central components of the bile acid pool, and provide a road map for deorphaning and engineering pathways from the microbiome as a critical step toward controlling the metabolic output of the gut microbiota.
A common human gut bacterium, Bacteroides fragilis, produces a sphingolipid ligand for the conserved host receptor CD1d and can modulate natural killer T cell activity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.