Modern day drug discovery is extremely expensive and time consuming. Although computational approaches help accelerate and decrease the cost of drug discovery, existing computational software packages for docking-based drug discovery suffer from both low accuracy and high latency. A few recent machine learning-based approaches have been proposed for virtual screening by improving the ability to evaluate protein−ligand binding affinity, but such methods rely heavily on conventional docking software to sample docking poses, which results in excessive execution latencies. Here, we propose and evaluate a novel graph neural network (GNN)-based framework, MedusaGraph, which includes both pose-prediction (sampling) and pose-selection (scoring) models. Unlike the previous machine learning-centric studies, MedusaGraph generates the docking poses directly and achieves from 10 to 100 times speedup compared to state-of-the-art approaches, while having a slightly better docking accuracy.
The
high-performance computational techniques have brought significant
benefits for drug discovery efforts in recent decades. One of the
most challenging problems in drug discovery is the protein–ligand
binding pose prediction. To predict the most stable structure of the
complex, the performance of conventional structure-based molecular
docking methods heavily depends on the accuracy of scoring or energy
functions (as an approximation of affinity) for each pose of the protein–ligand
docking complex to effectively guide the search in an exponentially
large solution space. However, due to the heterogeneity of molecular
structures, the existing scoring calculation methods are either tailored
to a particular data set or fail to exhibit high accuracy. In this
paper, we propose a convolutional neural network (CNN)-based model
that learns to predict the stability factor of the protein–ligand
complex and exhibits the ability of CNNs to improve the existing docking
software. Evaluated results on PDBbind data set indicate that our
approach reduces the execution time of the traditional docking-based
method while improving the accuracy. Our code, experiment scripts,
and pretrained models are available at .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.