PSICHIC: physicochemical graph neural network for learning protein-ligand interaction fingerprints from sequence data

Koh, Huan Yee; Nguyen, Anh T.N.; Pan, Shirui; May, Lauren T.; Webb, Geoffrey I.

doi:10.1101/2023.09.17.558145

Cited by 7 publications

(4 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared with structure-based and complex-based methods, the model shows comparable performances across all metrics with lower standard deviations. Furthermore, the model performs on par with other more sophisticated state-of-the-art methods, namely PSICHIC [58] and TankBind. It is worth noting that PSICHIC also leverages the residue-level embeddings extracted from the pre-trained ESM; however, Koh et al [58] use these embeddings to construct 2D graphs of proteins.…”

Section: Resultsmentioning

confidence: 99%

“…Furthermore, the model performs on par with other more sophisticated state-of-the-art methods, namely PSICHIC [58] and TankBind. It is worth noting that PSICHIC also leverages the residue-level embeddings extracted from the pre-trained ESM; however, Koh et al [58] use these embeddings to construct 2D graphs of proteins. This does not preserve the SE(3)-symmetry (rotations and translations), which is an important property in learning three-dimensional structures.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal protein representation learning and target-aware variational auto-encoders for protein-binding ligand generation

Khang Ngo,

Son Hy

2024

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Without knowledge of specific pockets, generating ligands based on the global structure of a protein target plays a crucial role in drug discovery as it helps reduce the search space for potential drug-like candidates in the pipeline. However, contemporary methods require optimizing tailored networks for each protein, which is arduous and costly. To address this issue, we introduce TargetVAE, a target-aware variational auto-encoder that generates ligands with desirable properties including high binding affinity and high synthesizability to arbitrary target proteins, guided by a multimodal deep neural network built based on geometric and sequence models, named Protein Multimodal Network (PMN), as the prior for the generative model. PMN unifies different representations of proteins (e.g., primary structure - sequence of amino acids, 3D tertiary structure, and residue-level graph) into a single representation. Our multimodal architecture learns from the entire protein structure and is able to capture their sequential, topological, and geometrical information by utilizing language modeling, graph neural networks, and geometric deep learning. We showcase the superiority of our approach by conducting extensive experiments and evaluations, including predicting protein-ligand binding affinity in the PBDBind v2020 dataset as well as the assessment of generative model quality, ligand generation for unseen targets, and docking score computation. Empirical results demonstrate the promising and competitive performance of our proposed approach. Our software package is publicly available at https://github.com/HySonLab/Ligand_Generation

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Multimodal protein representation learning and target-aware variational auto-encoders for protein-binding ligand generation

Khang Ngo,

Son Hy

2024

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…The incorporation of these diverse datasets forms a comprehensive and varied tested, allowing us to thoroughly assess the predictive capabilities of our multimodal representation when estimating protein–ligand binding affinities. To ensure a fair and standardized evaluation, we meticulously followed the test/training/validation split settings as outlined in previous studies, specifically adhering to the configurations defined in the respective sources for the DAVIS, KIBA, and PDBbind version 2020 datasets [ 15 , 43 ]. By maintaining this consistency, we aimed to create a level playing field for comparisons, allowing for an equitable assessment of our multimodal representation’s performance.…”

Section: Methodsmentioning

confidence: 99%

Multimodal pretraining for unsupervised protein representation learning

Nguyen,

2024

Biology Methods and Protocols

View full text Add to dashboard Cite

Proteins are complex biomolecules essential for numerous biological processes, making them crucial targets for advancements in molecular biology, medical research, and drug design. Understanding their intricate, hierarchical structures and functions is vital for progress in these fields. To capture this complexity, we introduce MPRL—Multimodal Protein Representation Learning, a novel framework for symmetry-preserving multimodal pretraining that learns unified, unsupervised protein representations by integrating primary and tertiary structures. MPRL employs Evolutionary Scale Modeling (ESM-2) for sequence analysis, Variational Graph Auto-Encoders (VGAE) for residue-level graphs, and PointNet Autoencoder (PAE) for 3D point clouds of atoms, each designed to capture the spatial and evolutionary intricacies of proteins while preserving critical symmetries. By leveraging Auto-Fusion to synthesize joint representations from these pretrained models, MPRL ensures robust and comprehensive protein representations. Our extensive evaluation demonstrates that MPRL significantly enhances performance in various tasks such as protein-ligand binding affinity prediction, protein fold classification, enzyme activity identification, and mutation stability prediction. This framework advances the understanding of protein dynamics and facilitates future research in the field. Our source code is publicly available at https://github.com/HySonLab/Protein_Pretrain.

show abstract

“…The incorporation of these diverse datasets forms a comprehensive and varied tested, allowing us to thoroughly assess the predictive capabilities of our multimodal representation when estimating proteinligand binding affinities. To ensure a fair and standardized evaluation, we meticulously followed the test/training/validation split settings as outlined in previous studies, specifically adhering to the configurations defined in the respective sources for the DAVIS, KIBA, and PDBbind v2020 datasets [57, 43]. By maintaining this consistency, we aimed to create a level playing field for comparisons, allowing for an equitable assessment of our multimodal representation’s performance.…”

Section: Methodsmentioning

confidence: 99%

Multimodal Pretraining for Unsupervised Protein Representation Learning

Nguyen,

2023

Preprint

View full text Add to dashboard Cite

In this paper, we introduce a framework of symmetry-preserving multimodal pretraining to learn a unified representation of proteins in an unsupervised manner, encompassing both primary and tertiary structures. Our approach involves proposing specific pretraining methods for sequences, graphs, and 3D point clouds associated with each protein structure, leveraging the power of large language models and generative models. We present a novel way to combining representations from multiple sources of information into a single global representation for proteins. We carefully analyze the performance of our framework in the pretraining tasks. For the fine-tuning tasks, our experiments have shown that our new multimodal representation can achieve competitive results in protein-ligand binding affinity prediction, protein fold classification, enzyme identification and mutation stability prediction. We expect that this work will accelerate future research in proteins. Our source code in PyTorch deep learning framework is publicly available athttps://github.com/HySonLab/Protein_Pretrain.

show abstract

PSICHIC: physicochemical graph neural network for learning protein-ligand interaction fingerprints from sequence data

Cited by 7 publications

References 62 publications

Multimodal protein representation learning and target-aware variational auto-encoders for protein-binding ligand generation

Multimodal protein representation learning and target-aware variational auto-encoders for protein-binding ligand generation

Multimodal pretraining for unsupervised protein representation learning

Multimodal Pretraining for Unsupervised Protein Representation Learning

Contact Info

Product

Resources

About