2023
DOI: 10.1186/s13321-023-00689-w
|View full text |Cite
|
Sign up to set email alerts
|

How to approach machine learning-based prediction of drug/compound–target interactions

Abstract: The identification of drug/compound–target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extract… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 89 publications
1
13
0
Order By: Relevance
“…ProtBENCH [30] contains protein family‐specific bioactivity data, spanning multiple protein superfamilies, including membrane receptors, ion channels, transporters, transcription factors, epigenetic regulators, and enzymes with five subgroups (i. e., transferases, proteases, hydrolases, oxidoreductases, and other enzymes). The family subsets vary in number of interactions (19 K—220 K), number of proteins (100—1 K), and number of compounds (10 K—120 K).…”
Section: Methodsmentioning
confidence: 99%
“…ProtBENCH [30] contains protein family‐specific bioactivity data, spanning multiple protein superfamilies, including membrane receptors, ion channels, transporters, transcription factors, epigenetic regulators, and enzymes with five subgroups (i. e., transferases, proteases, hydrolases, oxidoreductases, and other enzymes). The family subsets vary in number of interactions (19 K—220 K), number of proteins (100—1 K), and number of compounds (10 K—120 K).…”
Section: Methodsmentioning
confidence: 99%
“…The pre-trained model from the DrugBank dataset underwent fine-tuning using the random forest regression method, and the learning rate was selected from the range [1e-5, 1e-4, 4e-4, 1e-3]. Furthermore, different batch sizes, namely [8, 16, 32], were experimented with. To ensure robustness, the five-fold cross-validation technique was utilized.…”
Section: Methodsmentioning
confidence: 99%
“…This approach allowed for the generation of negative samples, resulting in a balanced dataset for analysis. Dissimilar-compound-split dataset: This dataset is based on protein familyspecific datasets (Large-scale) [32], further constructed by applying a strategy that only considers compound similarities while distributing bioactivity data points into train-test splits, as presented in Table 2. Compounds in train and test splits are dissimilar (Tanimoto score < 0.5).…”
Section: Dataset and Evaluation Metricsmentioning
confidence: 99%
“…While the ligand-based approach relies on a sufficient number of known ligands for a given protein; the Molecular docking approach is limited to available 3D protein structures [8]. Conversely, machine learning-based methods have emerged as a highly promising avenue for predicting DPIs [9]- [12].…”
Section: Introductionmentioning
confidence: 99%