For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don鈥檛 rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.
For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don't rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes1 across the microbial tree of life, and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 161 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for shifting the focus from obtaining structures to putting them into context, to transform all branches of biology, including a shift from sequence-based to sequence-structure-function based meta-omics analyses.
The past decade has seen advancement in high-throughput sequencing technologies resulting in rapid accumulation of genomic data from microbial communities. While this growth in sequence data and gene discovery is impressive, the majority of microbial gene functions remain uncharacterized.
In this study, electrogenic microbial communities originating from a single source were multiplied using our custom-made, 96-well-plate-based microbial fuel cell (MFC) array. Developed communities operated under different pH conditions and produced currents up to 19.4 A/m3 (0.6 A/m2) within 2 days of inoculation. Microscopic observations [combined scanning electron microscopy (SEM) and energy dispersive spectroscopy (EDS)] revealed that some species present in the anodic biofilm adsorbed copper on their surface because of the bioleaching of the printed circuit board (PCB), yielding Cu2 + ions up to 600 mg/L. Beta- diversity indicates taxonomic divergence among all communities, but functional clustering is based on reactor pH. Annotated metagenomes showed the high presence of multicopper oxidases and Cu-resistance genes, as well as genes encoding aliphatic and aromatic hydrocarbon-degrading enzymes, corresponding to PCB bioleaching. Metagenome analysis revealed a high abundance of Dietzia spp., previously characterized in MFCs, which did not grow at pH 4. Binning metagenomes allowed us to identify novel species, one belonging to Actinotalea, not yet associated with electrogenicity and enriched only in the pH 7 anode. Furthermore, we identified 854 unique protein-coding genes in Actinotalea that lacked sequence homology with other metagenomes. The function of some genes was predicted with high accuracy through deep functional residue identification (DeepFRI), with several of these genes potentially related to electrogenic capacity. Our results demonstrate the feasibility of using MFC arrays for the enrichment of functional electrogenic microbial consortia and data mining for the comparative analysis of either consortia or their members.
Comprehensive protein function annotation is key to understanding microbiome-related disease mechanisms in the host organisms. We have developed a new metagenome analysis workflow integrating de novo genome reconstruction, taxonomic profiling and deep learning-based functional annotations from DeepFRI. We validate DeepFRI functional annotations by comparing them to orthology-based annotations from eggNOG. Further, we demonstrate the usage of the workflow using 1,070 infant metagenome samples from the DIABIMMUNE cohort. We have generated a sequence catalogue comprising a collection of 7,174 metagenome-assembled genomes (MAGs), of which 2,255 are high quality near-complete bacterial genomes. These genomes encode 1.9 million non-redundant genes. We found high concordance (70%) between GO annotations predicted by DeepFRI and eggNOG. Importantly, we show that DeepFRI improved the annotation coverage, with nearly all the genes in the gene catalogue obtaining GO molecular function annotations, although the annotations are less specific compared to eggNOG. The pan-genome analysis of 42 bacterial species revealed a striking difference between DeepFRI and eggNOG annotations. eggNOG was shown to annotate more genes coming from well-studied organisms such as Escherichia coli while DeepFRI on the other hand is not sensitive to taxa. This work contributes to the understanding of the functional signature of the human gut microbiome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2025 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.