Elucidating protein–ligand interaction is crucial for studying the function of proteins and compounds in an organism and critical for drug discovery and design. The problem of protein–ligand interaction is traditionally tackled by molecular docking and simulation, which is based on physical forces and statistical potentials and cannot effectively leverage cryo-EM data and existing protein structural information in the protein–ligand modeling process. In this work, we developed a deep learning bioinformatics pipeline (DeepProLigand) to predict protein–ligand interactions from cryo-EM density maps of proteins and ligands. DeepProLigand first uses a deep learning method to predict the structure of proteins from cryo-EM maps, which is averaged with a reference (template) structure of the proteins to produce a combined structure to add ligands. The ligands are then identified and added into the structure to generate a protein–ligand complex structure, which is further refined. The method based on the deep learning prediction and template-based modeling was blindly tested in the 2021 EMDataResource Ligand Challenge and was ranked first in fitting ligands to cryo-EM density maps. These results demonstrate that the deep learning bioinformatics approach is a promising direction for modeling protein–ligand interactions on cryo-EM data using prior structural information.
Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter‐chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and performed very well in estimating the global structure accuracy of assembly models. The average per‐target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per‐target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analyzed. The results demonstrate that combining the multi‐model method (PSS) with the complementary single‐model method (ICPS) is a promising approach to EMA.
The advent of single-particle cryogenic electron microscopy (cryo-EM) has brought forth a new era of structural biology, enabling the routine determination of large biological protein complexes and assemblies at atomic resolution. The high-resolution structures of protein complexes and assemblies significantly expedite biomedical research and drug discovery. However, automatically and accurately reconstructing protein structures from high-resolution density maps generated by cryo-EM is still time-consuming and challenging when template structures for the protein chains in a target protein complex are unavailable. Artificial intelligence (AI) methods such as deep learning trained on limited amounts of labeled cryo-EM density maps generate unstable reconstructions. To address this issue, we created a dataset called Cryo2Struct consisting of 7,600 preprocessed cryo-EM density maps whose voxels are labelled according to their corresponding known protein structures for training and testing AI methods to infer protein structures from density maps. It is larger and has better quality than any existing, publicly available dataset. We trained and tested deep learning models on Cryo2Struct to make sure it is ready for the large-scale development of AI methods for reconstructing protein structures from cryo-EM density maps. The source code, data and instructions to reproduce our results are freely available at https://github.com/BioinfoMachineLearning/cryo2struct.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.