BackgroundGlycoside hydrolases (GH) targeting cellulose, xylan, and chitin are common in the bacterial genomes that have been sequenced. Little is known, however, about the architecture of multi-domain and multi-activity glycoside hydrolases. In these enzymes, combined catalytic domains act synergistically and thus display overall improved catalytic efficiency, making these proteins of high interest for the biofuel technology industry.ResultsHere, we identify the domain organization in 40,946 proteins targeting cellulose, xylan, and chitin derived from 11,953 sequenced bacterial genomes. These bacteria are known to be capable, or to have the potential, to degrade polysaccharides, or are newly identified potential degraders (e.g., Actinospica, Hamadaea, Cystobacter, and Microbispora). Most of the proteins we identified contain a single catalytic domain that is frequently associated with an accessory non-catalytic domain. Regarding multi-domain proteins, we found that many bacterial strains have unique GH protein architectures and that the overall protein organization is not conserved across most genera. We identified 217 multi-activity proteins with at least two GH domains for cellulose, xylan, and chitin. Of these proteins, 211 have GH domains targeting similar or associated substrates (i.e., cellulose and xylan), whereas only six proteins target both cellulose and chitin. Fifty-two percent of multi-activity GHs are hetero-GHs. Finally, GH6, −10, −44 and −48 domains were mostly C-terminal; GH9, −11, −12, and −18 were mostly N-terminal; and GH5 domains were either N- or C-terminal.ConclusionWe identified 40,946 multi-domain/multi-activity proteins targeting cellulase, chitinase, and xylanase in bacterial genomes and proposed new candidate lineages and protein architectures for carbohydrate processing that may play a role in biofuel production.Electronic supplementary materialThe online version of this article (doi:10.1186/s13068-016-0538-6) contains supplementary material, which is available to authorized users.
The identification of glycoside hydrolases (GHs) for efficient polysaccharide deconstruction is essential for the development of biofuels. Here, we investigate the potential of sequential HMM-profile identification for the rapid and precise identification of the multi-domain architecture of GHs from various datasets. First, as a validation, we successfully reannotated >98% of the biochemically characterized enzymes listed on the CAZy database. Next, we analyzed the 43 million non-redundant sequences from the M5nr data and identified 322,068 unique GHs. Finally, we searched 129 assembled metagenomes retrieved from MG-RAST for environmental GHs and identified 160,790 additional enzymes. Although most identified sequences corresponded to single domain enzymes, many contained several domains, including known accessory domains and some domains never identified in association with GH. Several sequences displayed multiple catalytic domains and few of these potential multi-activity proteins combined potentially synergistic domains. Finally, we produced and confirmed the biochemical activities of a GH5-GH10 cellulase-xylanase and a GH11-CE4 xylanase-esterase. Globally, this “gene to enzyme pipeline” provides a rationale for mining large datasets in order to identify new catalysts combining unique properties for the efficient deconstruction of polysaccharides.
the annotation of short-reads metagenomes is an essential process to understand the functional potential of sequenced microbial communities. Annotation techniques based solely on the identification of local matches tend to confound local sequence similarity and overall protein homology and thus don't mirror the complex multidomain architecture and the shuffling of functional domains in many protein families. Here, we present MetaGeneHunt to identify specific protein domains and to normalize the hit-counts based on the domain length. We used MetaGeneHunt to investigate the potential for carbohydrate processing in the mouse gastrointestinal tract. We sampled, sequenced, and analyzed the microbial communities associated with the bolus in the stomach, intestine, cecum, and colon of five captive mice. Focusing on Glycoside Hydrolases (GHs) we found that, across samples, 58.3% of the 4,726,023 short-read sequences matching with a GH domain-containing protein were located outside the domain of interest. next, before comparing the samples, the counts of localized hits matching the domains of interest were normalized to account for the corresponding domain length. Microbial communities in the intestine and cecum displayed characteristic GH profiles matching distinct microbial assemblages. Conversely, the stomach and colon were associated with structurally and functionally more diverse and variable microbial communities. Across samples, despite fluctuations, changes in the functional potential for carbohydrate processing correlated with changes in community composition. Overall MetaGeneHunt is a new way to quickly and precisely identify discrete protein domains in sequenced metagenomes processed with MG-RASt. in addition, using the sister program "GeneHunt" to create custom Reference Annotation Table, MetaGeneHunt provides an unprecedented way to (re)investigate the precise distribution of any protein domain in short-reads metagenomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.