MotivationStructure based ligand discovery is one of the most successful approaches for augmenting the drug discovery process. Currently, there is a notable shift towards machine learning (ML) methodologies to aid such procedures. Deep learning has recently gained considerable attention as it allows the model to ‘learn’ to extract features that are relevant for the task at hand.ResultsWe have developed a novel deep neural network estimating the binding affinity of ligand–receptor complexes. The complex is represented with a 3D grid, and the model utilizes a 3D convolution to produce a feature map of this representation, treating the atoms of both proteins and ligands in the same manner. Our network was tested on the CASF-2013 ‘scoring power’ benchmark and Astex Diverse Set and outperformed classical scoring functions.Availability and implementationThe model, together with usage instructions and examples, is available as a git repository at http://gitlab.com/cheminfIBB/pafnucy.Supplementary information Supplementary data are available at Bioinformatics online.
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (i) tackle new tasks, like protein-ligand complex structure prediction, (ii) investigate the process by which the model learns, which remains poorly understood, and (iii) assess the model's generalization capacity to unseen regions of fold space. Here we report OpenFold, a fast, memory-efficient, and trainable implementation of AlphaFold2, and OpenProteinSet, the largest public database of protein multiple sequence alignments. We use OpenProteinSet to train OpenFold from scratch, fully matching the accuracy of AlphaFold2. Having established parity, we assess OpenFold's capacity to generalize across fold space by retraining it using carefully designed datasets. We find that OpenFold is remarkably robust at generalizing despite extreme reductions in training set size and diversity, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced by OpenFold during training, we also gain surprising insights into the manner in which the model learns to fold proteins, discovering that spatial dimensions are learned sequentially. Taken together, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial new resource for the protein modeling community.
In recent years machine learning (ML) took bioand cheminformatics fields by storm, providing new solutions for a vast repertoire of problems related to protein sequence, structure, and interactions analysis. ML techniques, deep neural networks especially, were proven more effective than classical models for tasks like predicting binding affinity for molecular complex.In this work we investigated the earlier stage of drug discovery process -finding druggable pockets on protein surface, that can be later used to design active molecules. For this purpose we developed a 3D fully convolutional neural network capable of binding site segmentation.Our solution has high prediction accuracy and provides intuitive representations of the results, which makes it easy to incorporate into drug discovery projects. The model's source code, together with scripts for most common use-cases is freely available at
Fungi are able to switch between different lifestyles in order to adapt to environmental changes. Their ecological strategy is connected to their secretome as fungi obtain nutrients by secreting hydrolytic enzymes to their surrounding and acquiring the digested molecules. We focus on fungal serine proteases (SPs), the phylogenetic distribution of which is barely described so far. In order to collect a complete set of fungal proteases, we searched over 600 fungal proteomes. Obtained results suggest that serine proteases are more ubiquitous than expected. From 54 SP families described in MEROPS Peptidase Database, 21 are present in fungi. Interestingly, 14 of them are also present in Metazoa and Viridiplantae – this suggests that, except one (S64), all fungal SP families evolved before plants and fungi diverged. Most representatives of sequenced eukaryotic lineages encode a set of 13–16 SP families. The number of SPs from each family varies among the analysed taxa. The most abundant are S8 proteases. In order to verify hypotheses linking lifestyle and expansions of particular SP, we performed statistical analyses and revealed previously undescribed associations. Here, we present a comprehensive evolutionary history of fungal SP families in the context of fungal ecology and fungal tree of life.
The last decade brought a still growing experimental evidence of mobilome impact on host’s gene expression. We systematically analysed genomic location of transposable elements (TEs) in 625 publicly available fungal genomes from the NCBI database in order to explore their potential roles in genome evolution and correlation with species’ lifestyle. We found that non-autonomous TEs and remnant copies are evenly distributed across genomes. In consequence, they also massively overlap with regions annotated as genes, which suggests a great contribution of TE-derived sequences to host’s coding genome. Younger and potentially active TEs cluster with one another away from genic regions. This non-randomness is a sign of either selection against insertion of TEs in gene proximity or target site preference among some types of TEs. Proteins encoded by genes with old transposable elements insertions have significantly less repeat and protein-protein interaction motifs but are richer in enzymatic domains. However, genes only proximal to TEs do not display any functional enrichment. Our findings show that adaptive cases of TE insertion remain a marginal phenomenon, and the overwhelming majority of TEs are evolving neutrally. Eventually, animal-related and pathogenic fungi have more TEs inserted into genes than fungi with other lifestyles. This is the first systematic, kingdom-wide study concerning mobile elements and their genomic neighbourhood. The obtained results should inspire further research concerning the roles TEs played in evolution and how they shape the life we know today.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.