2019
DOI: 10.1021/acs.jcim.8b00663
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning

Abstract: Binding prediction between targets and drug-like compounds through Deep Neural Networks have generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how di↵erent cross-validation strategies applied to data from di↵erent molecular databases a↵ect to the performance of binding prediction proteochemometrics models. These strategies are: (… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
44
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 30 publications
(44 citation statements)
references
References 48 publications
0
44
0
Order By: Relevance
“…Explanatory models have found use in the formal description of differences in performance as a function of design factors (Lopez-del Rio et al , 2019; Picart-Armada et al , 2019). Following (Picart-Armada et al , 2019), the trends in AUROC and AUPRC were described through logistic-like quasibinomial models with a logit link function, as a generalisation of logistic models to prevent over and under-dispersion issues.…”
Section: Methodsmentioning
confidence: 99%
“…Explanatory models have found use in the formal description of differences in performance as a function of design factors (Lopez-del Rio et al , 2019; Picart-Armada et al , 2019). Following (Picart-Armada et al , 2019), the trends in AUROC and AUPRC were described through logistic-like quasibinomial models with a logit link function, as a generalisation of logistic models to prevent over and under-dispersion issues.…”
Section: Methodsmentioning
confidence: 99%
“…CNNs imply translational invariance [10] and can be used to find relevant patterns with biological meaning [8,5,11,12]. For their part, bidirectional RNNs (and the derived Long Short-Term Memory and Gated Recurrent Units) are appropiate for modelling biological sequences since they are suited for data with a sequential but non-causal structure, variable length, and long-range dependencies [13,14,15,16]. Both architectures are usually combined, as in DEEPre [17], where a CNN-RNN model performs a hierarchical classification of enzymes.…”
Section: Introductionmentioning
confidence: 99%
“…The analogy between text and proteins, understood as sequences of characters with a meaning, motivates the application of Natural Language Processing (NLP) techniques to amino acid sequences. Along these lines, machine-learning derived embeddings [23,24,25,26] and one-hot encoding [14,9,27,12,17,7] have become very popular. Specifically, the latter method has been widely used in protein-based DL models since neural networks are able to extract features from raw data.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations