Proteins belonging to PD-(D/E)XK phosphodiesterases constitute a functionally diverse superfamily with representatives involved in replication, restriction, DNA repair and tRNA–intron splicing. Their malfunction in humans triggers severe diseases, such as Fanconi anemia and Xeroderma pigmentosum. To date there have been several attempts to identify and classify new PD-(D/E)KK phosphodiesterases using remote homology detection methods. Such efforts are complicated, because the superfamily exhibits extreme sequence and structural divergence. Using advanced homology detection methods supported with superfamily-wide domain architecture and horizontal gene transfer analyses, we provide a comprehensive reclassification of proteins containing a PD-(D/E)XK domain. The PD-(D/E)XK phosphodiesterases span over 21 900 proteins, which can be classified into 121 groups of various families. Eleven of them, including DUF4420, DUF3883, DUF4263, COG5482, COG1395, Tsp45I, HaeII, Eco47II, ScaI, HpaII and Replic_Relax, are newly assigned to the PD-(D/E)XK superfamily. Some groups of PD-(D/E)XK proteins are present in all domains of life, whereas others occur within small numbers of organisms. We observed multiple horizontal gene transfers even between human pathogenic bacteria or from Prokaryota to Eukaryota. Uncommon domain arrangements greatly elaborate the PD-(D/E)XK world. These include domain architectures suggesting regulatory roles in Eukaryotes, like stress sensing and cell-cycle regulation. Our results may inspire further experimental studies aimed at identification of exact biological functions, specific substrates and molecular mechanisms of reactions performed by these highly diverse proteins.
Ribonuclease H-like (RNHL) superfamily, also called the retroviral integrase superfamily, groups together numerous enzymes involved in nucleic acid metabolism and implicated in many biological processes, including replication, homologous recombination, DNA repair, transposition and RNA interference. The RNHL superfamily proteins show extensive divergence of sequences and structures. We conducted database searches to identify members of the RNHL superfamily (including those previously unknown), yielding >60 000 unique domain sequences. Our analysis led to the identification of new RNHL superfamily members, such as RRXRR (PF14239), DUF460 (PF04312, COG2433), DUF3010 (PF11215), DUF429 (PF04250 and COG2410, COG4328, COG4923), DUF1092 (PF06485), COG5558, OrfB_IS605 (PF01385, COG0675) and Peptidase_A17 (PF05380). Based on the clustering analysis we grouped all identified RNHL domain sequences into 152 families. Phylogenetic studies revealed relationships between these families, and suggested a possible history of the evolution of RNHL fold and its active site. Our results revealed clear division of the RNHL superfamily into exonucleases and endonucleases. Structural analyses of features characteristic for particular groups revealed a correlation between the orientation of the C-terminal helix with the exonuclease/endonuclease function and the architecture of the active site. Our analysis provides a comprehensive picture of sequence-structure-function relationships in the RNHL superfamily that may guide functional studies of the previously uncharacterized protein families.
Fungi are able to switch between different lifestyles in order to adapt to environmental changes. Their ecological strategy is connected to their secretome as fungi obtain nutrients by secreting hydrolytic enzymes to their surrounding and acquiring the digested molecules. We focus on fungal serine proteases (SPs), the phylogenetic distribution of which is barely described so far. In order to collect a complete set of fungal proteases, we searched over 600 fungal proteomes. Obtained results suggest that serine proteases are more ubiquitous than expected. From 54 SP families described in MEROPS Peptidase Database, 21 are present in fungi. Interestingly, 14 of them are also present in Metazoa and Viridiplantae – this suggests that, except one (S64), all fungal SP families evolved before plants and fungi diverged. Most representatives of sequenced eukaryotic lineages encode a set of 13–16 SP families. The number of SPs from each family varies among the analysed taxa. The most abundant are S8 proteases. In order to verify hypotheses linking lifestyle and expansions of particular SP, we performed statistical analyses and revealed previously undescribed associations. Here, we present a comprehensive evolutionary history of fungal SP families in the context of fungal ecology and fungal tree of life.
PIN-like domains constitute a widespread superfamily of nucleases, diverse in terms of the reaction mechanism, substrate specificity, biological function and taxonomic distribution. Proteins with PIN-like domains are involved in central cellular processes, such as DNA replication and repair, mRNA degradation, transcription regulation and ncRNA maturation. In this work, we identify and classify the most complete set of PIN-like domains to provide the first comprehensive analysis of sequence–structure–function relationships within the whole PIN domain-like superfamily. Transitive sequence searches using highly sensitive methods for remote homology detection led to the identification of several new families, including representatives of Pfam (DUF1308, DUF4935) and CDD (COG2454), and 23 other families not classified in the public domain databases. Further sequence clustering revealed relationships between individual sequence clusters and showed heterogeneity within some families, suggesting a possible functional divergence. With five structural groups, 70 defined clusters, over 100,000 proteins, and broad biological functions, the PIN domain-like superfamily constitutes one of the largest and most diverse nuclease superfamilies. Detailed analyses of sequences and structures, domain architectures, and genomic contexts allowed us to predict biological function of several new families, including new toxin-antitoxin components, proteins involved in tRNA/rRNA maturation and transcription/translation regulation.
The last decade brought a still growing experimental evidence of mobilome impact on host’s gene expression. We systematically analysed genomic location of transposable elements (TEs) in 625 publicly available fungal genomes from the NCBI database in order to explore their potential roles in genome evolution and correlation with species’ lifestyle. We found that non-autonomous TEs and remnant copies are evenly distributed across genomes. In consequence, they also massively overlap with regions annotated as genes, which suggests a great contribution of TE-derived sequences to host’s coding genome. Younger and potentially active TEs cluster with one another away from genic regions. This non-randomness is a sign of either selection against insertion of TEs in gene proximity or target site preference among some types of TEs. Proteins encoded by genes with old transposable elements insertions have significantly less repeat and protein-protein interaction motifs but are richer in enzymatic domains. However, genes only proximal to TEs do not display any functional enrichment. Our findings show that adaptive cases of TE insertion remain a marginal phenomenon, and the overwhelming majority of TEs are evolving neutrally. Eventually, animal-related and pathogenic fungi have more TEs inserted into genes than fungi with other lifestyles. This is the first systematic, kingdom-wide study concerning mobile elements and their genomic neighbourhood. The obtained results should inspire further research concerning the roles TEs played in evolution and how they shape the life we know today.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.