Protein domain-based prediction of drug/compound–target interactions and experimental validation on LIM kinases

Doğan, Tunca; Güzelcan, Ece Akhan; Baumann, Marcus; Koyaş, Altay; Ataş, Heval; Baxendale, Ian R.; Martin, Maria Jesus; Cetin-Atalay, Rengül

doi:10.1371/journal.pcbi.1009171

Cited by 15 publications

(7 citation statements)

References 70 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Considering algorithmic approaches, sequence-based protein representations can be grouped as conventional/ classical descriptors (or descriptor sets) and learned embeddings. Conventional descriptors are mostly modeldriven, meaning that they are generated by applying predefined rules and/or statistical calculations on sequences considering various molecular properties that include physicochemical [10][11][12], geometrical [13,14] and topological [12] characteristics of amino acids, as well as sequence composition [11,15], semantic similarities [16], functional characteristics/properties [17][18][19][20], and evolutionary relationships [13,21] of proteins. Learned protein embeddings (a.k.a.…”

Section: Introductionmentioning

confidence: 99%

How to approach machine learning-based prediction of drug/compound–target interactions

Guvenilir

Doğan

2023

J Cheminform

Self Cite

View full text Add to dashboard Cite

The identification of drug/compound–target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.

show abstract

Section: Introductionmentioning

confidence: 99%

How to approach machine learning-based prediction of drug/compound–target interactions

Guvenilir

Doğan

2023

J Cheminform

Self Cite

View full text Add to dashboard Cite

show abstract

“…Given that membrane proteins are greatly involved in DTIs, as indicated above, we assume that these tools will also have an acceptable ability to predict DTIs when the drug target is a membrane protein, even though the precise ratio of membrane proteins vs. globular proteins was not made clear by these aforementioned studies in their training datasets. This has been suggested in the DRUIDom work [263] . However, compared with sequence-based prediction of DTIs, the field is greatly devoid of prediction tools developed based on 3D protein structures [264] .…”

Section: Membrane Proteins and Drug Target Interactionmentioning

confidence: 77%

Machine learning in computational modelling of membrane protein sequences and structures: From methodologies to applications

Sun

Kulandaisamy

Liu

et al. 2023

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

“…Compounds with IC 50 lower than 10 μM were labeled as positives and compounds with IC 50 higher than 20 μM were labeled as negatives. Compounds with IC 50 between 10 and 20 μM were considered ambiguous and were therefore discarded in a similar manner to a previous work by Rifaioglu et al 41 and Doğan et al 42 From the two databases, a total of 2281 instances (2242 positives and 39 negatives) were obtained. Since there was a higher proportion of negatives—compounds with weak or no GR binding affinities—in the real-world data, we designed the training dataset to have a ten-fold greater proportion of negatives than positives.…”

Section: Methodsmentioning

confidence: 99%

A machine learning-integrated stepwise method to discover novel anti-obesity phytochemicals that antagonize the glucocorticoid receptor

et al. 2023

View full text Add to dashboard Cite

show abstract

Protein domain-based prediction of drug/compound–target interactions and experimental validation on LIM kinases

Cited by 15 publications

References 70 publications

How to approach machine learning-based prediction of drug/compound–target interactions

How to approach machine learning-based prediction of drug/compound–target interactions

Machine learning in computational modelling of membrane protein sequences and structures: From methodologies to applications

A machine learning-integrated stepwise method to discover novel anti-obesity phytochemicals that antagonize the glucocorticoid receptor

Contact Info

Product

Resources

About