Diversity-generating retroelements (DGRs) are novel genetic elements that use reverse transcription to generate vast numbers of sequence variants in specific target genes. Here, we present a detailed comparative bioinformatic analysis that depicts the landscape of DGR sequences in nature as represented by data in GenBank. Over 350 unique DGRs are identified, which together form a curated reference set of putatively functional DGRs. We classify target genes, variable repeats and DGR cassette architectures, and identify two new accessory genes. The great variability of target genes implies roles of DGRs in many undiscovered biological processes. There is much evidence for horizontal transfers of DGRs, and we identify lineages of DGRs that appear to have specialized properties. Because GenBank contains data from only 10% of described species, the compilation may not be wholly representative of DGRs present in nature. Indeed, many DGR subtypes are present only once in the set and DGRs of the candidate phylum radiation bacteria, and Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea archaea, are exceptionally diverse in sequence, with little information available about functions of their target genes. Nonetheless, this study provides a detailed framework for classifying and studying DGRs as they are uncovered and studied in the future.
Diversity-generating retroelements (DGRs) are widely distributed in bacteria, archaea, and microbial viruses, and bring about unparalleled levels of sequence variation in target proteins. While DGR variable proteins share low sequence identity, the structures of several such proteins have revealed the C-type lectin (CLec)-fold as a conserved scaffold for accommodating massive sequence variation. This conservation has led to the suggestion that the CLec-fold may be useful in molecular surface display applications. Thermostability is an attractive feature in such applications, and thus we studied the variable protein of a DGR encoded by a prophage of the thermophile Thermus aquaticus. We report here the 2.8 Å resolution crystal structure of the variable protein from the T. aquaticus DGR, called TaqVP, and confirm that it has a CLec-fold. Remarkably, its variable region is nearly identical in structure to those of several other CLec-fold DGR variable proteins despite low sequence identity among these. TaqVP was found to be thermostable, which appears to be a property shared by several CLec-fold DGR variable proteins. These results provide impetus for the pursuit of the DGR variable protein CLec-fold in molecular display applications.
Diversity-generating retroelements (DGRs) are widely distributed in bacteria, archaea, and microbial viruses, and bring about unparalleled levels of sequence variation in target proteins. While DGR variable proteins share low sequence identity, the structures of several such proteins have revealed the C-type lectin (CLec)-fold as a conserved scaffold for accommodating massive sequence variation. This conservation has led to the suggestion that the CLec-fold may be useful in molecular surface display applications. Thermostability is an attractive feature in such applications, and thus we studied the variable protein of a DGR encoded by the thermophile Thermus aquaticus. We report here the 2.8 Å resolution crystal structure of the variable protein from the T. aquaticus DGR, called TaqVP, and confirm that it has a CLec-fold. Remarkably, its variable region is nearly identical in structure to those of several other CLec-fold DGR variable proteins despite low sequence identity among these. TaqVP was found to be thermostable, which appears to be a property shared by several CLec-fold DGR variable proteins. These results provide impetus for the pursuit of the DGR variable protein CLec-fold in molecular display applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.