2023
DOI: 10.1021/acs.jcim.3c00132
|View full text |Cite
|
Sign up to set email alerts
|

Large-Scale Modeling of Sparse Protein Kinase Activity Data

Abstract: Protein kinases are a protein family that plays an important role in several complex diseases such as cancer and cardiovascular and immunological diseases. Protein kinases have conserved ATP binding sites, which when targeted can lead to similar activities of inhibitors against different kinases. This can be exploited to create multitarget drugs. On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues. There is a vast amount of protein kinase activity data in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 43 publications
0
5
0
Order By: Relevance
“…Instead of simply grouping all analogs together into a given training/test fold, one can instead cluster all molecules based on their similarity, then build a test set comprised of the most "distinct" member of each cluster (chemical scaffold). This strategy -described by as "neighbor splitting" or "scaffold splitting"[49,[56][57][58][59][60] -does not entirely eliminate the information leakage that underlies "Standard Split" and "Split by Inhibitor", but rather minimizes the value of the leaked information to best mimic a real-world scenario in which prospective inputs are not wholly independent from the training examples.…”
mentioning
confidence: 99%
“…Instead of simply grouping all analogs together into a given training/test fold, one can instead cluster all molecules based on their similarity, then build a test set comprised of the most "distinct" member of each cluster (chemical scaffold). This strategy -described by as "neighbor splitting" or "scaffold splitting"[49,[56][57][58][59][60] -does not entirely eliminate the information leakage that underlies "Standard Split" and "Split by Inhibitor", but rather minimizes the value of the leaked information to best mimic a real-world scenario in which prospective inputs are not wholly independent from the training examples.…”
mentioning
confidence: 99%
“…The splitting strategy of the dataset reflects how the model would be used in real scenarios [ 27 , 31 , 32 ]. And it would greatly affect the credibility of the results because of issues such as data breaches.…”
Section: Methodsmentioning
confidence: 99%
“…The choice of how to split data can greatly influence our impression of future performance of the model [22,70]. QSPRpred supports any scikit-learn-style [71] data splitter class that has a method split(X,y) that yields for each split/fold the indices for each subset.…”
Section: Data Splittingmentioning
confidence: 99%
“…Therefore, PCM has inherent applicability challenges that combine those of single-task and multi-task modelling (i.e. dataset size [19,20], data balance [21,22], sparsity [23], but also unique featurization requirements that need to take the proteins themselves into account [19,20]).…”
Section: Introductionmentioning
confidence: 99%