In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).
Supervised graph classification is one of the most actively developing areas in machine learning (ML), with a broad range of domain applications, from social media to bioinformatics. Given a collection of graphs with categorical labels, the goal is to predict correct classes for unlabelled graphs. However, currently available ML tools view each such graph as a standalone entity and, as such, do not account for complex interdependencies among graphs. We propose a novel knowledge representation for graph learning called a {\it Graph of Graphs} (GoG). The key idea is to construct a new abstraction where each graph in the collection is represented by a node, while an edge then reflects similarity among the graphs. Such similarity can be assessed via a suitable graph distance. As a result, the graph classification problem can be then reformulated as a node classification problem. We show that the proposed new knowledge representation approach not only improves classification performance but substantially enhances robustness against label perturbation attacks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.