Pitfalls of machine learning models for protein–protein interaction networks

Lannelongue, Loïc; Inouye, Michael

doi:10.1093/bioinformatics/btae012

Bioinformatics

2024

DOI: 10.1093/bioinformatics/btae012

|View full text |Cite

Pitfalls of machine learning models for protein–protein interaction networks

Loïc Lannelongue,

Michael Inouye

Abstract: Motivation Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between algorithms that remain unexplained. Results … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Preprint2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Complementary evaluation of computational methods for predicting single residue effects on peptide binding specificities

Ayyildiz,

Noske,

Gisdon

et al. 2024

Preprint

View full text Add to dashboard Cite

Understanding the interactions that make up protein-protein or protein-peptide interfaces is a crucial step towards applications in biotechnology. The ability to discriminate between different partners defines the specificity of a binding protein and is equally important as its affinity to the target. Whereas many established computational methods provide an estimate of binding or non-binding, comparing similar ligands is still significantly more challenging. Here we evaluated the capability of predicting ligand binding specificity using three established but conceptually different physics-based methods for protein design. As a model system, we analyzed the binding of peptides to designed armadillo repeat proteins, where a single residue of the peptide was changed systematically, and compared the results with an experimental reference data set. The mutation of a single residue can have a strong impact on binding affinity and specificity, which is difficult to capture in sampling and scoring. We critically assessed the prediction accuracy of the computational methods and found that the prediction performance of each method is differently affected, suggesting the use of a complementary approach of the evaluated methods.Author SummaryProteins have to recognize other proteins and peptides in the cell with high specificity. To be able to predict such interactions with high precision would be immensely useful for medical and biotechnological applications. Here we tested three computational methods that use physics-based force fields on an experimental dataset and evaluated how well these predictions can be used to discriminate binding pockets on a single residue level. The predicted values of each method and the experimentally determined specificities correlated well, even though each approach had its biases. Therefore, we correlated the predictions with each other to complement the strengths and weaknesses of all approaches.

show abstract

Complementary evaluation of computational methods for predicting single residue effects on peptide binding specificities

Ayyildiz,

Noske,

Gisdon

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

PIPENN-EMB: ensemble net and protein embeddings generalise protein interface prediction beyond homology

Thomas,

Garcia Fernandez,

Haydarlou

et al. 2024

Preprint

View full text Add to dashboard Cite

Protein interactions are crucial for understanding biological functions and disease mechanisms, but predicting these remains a complex task in computational biology. Increasingly, Deep Learning models are having success in interface prediction. This study presents PIPENN-EMB which explores the added value of using embeddings from the ProtT5-XL protein language model. Our results show substantial improvement over the previously published PIPENN model for protein interaction interface prediction, reaching an MCC of 0.313 vs. 0.249, and AUC-ROC 0.800 vs. 0.755 on the BIO_DL_TE test set. We furthermore show that these embeddings cover a broad range of 'hand-crafted' protein features in ablation studies. PIPENN-EMB reaches state-of-the-art performance on the ZK448 dataset for protein-protein interface prediction. We showcase predictions on 25 resistance-related proteins from Mycobacterium tuberculosis. Furthermore, whereas other state-of-the-art sequence-based methods perform worse for proteins that have little recognisable homology in their training data, PIPENN-EMB generalises to remote homologs, yielding stable AUC-ROC across all three test sets with less than 30% sequence identity to the training dataset, and even to proteins with less than 15% sequence identity.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pitfalls of machine learning models for protein–protein interaction networks

Cited by 2 publications

References 26 publications

Complementary evaluation of computational methods for predicting single residue effects on peptide binding specificities

Complementary evaluation of computational methods for predicting single residue effects on peptide binding specificities

PIPENN-EMB: ensemble net and protein embeddings generalise protein interface prediction beyond homology

Contact Info

Product

Resources

About