Efficient cross-validation for kernelized least-squares regression with sparse basis expansions

Pahikkala, Tapio; Suominen, Hanna; Boberg, Jorma

doi:10.1007/s10994-012-5287-6

Cited by 11 publications

(4 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The observation that new drug targets are easier to predict than new targeted compounds is consistent with previous work [ 8 ]. Future improvements in the experimental drug–target bioactivity data coverage and quality, both in the individual profiling studies that focus on specific drug and target families, such as kinase inhibitors [ 17 , 18 ], as well as in the general drug and target databases, such as ChEMBL [ 37 ], could make it possible to start developing in silico prediction tools that can generalize beyond the training data and can be used, for instance, for prioritization of the most potential drug or target panels for experimental validation in human assays in vivo .…”

Section: Discussionmentioning

confidence: 99%

“…For example, the recent study by van Laarhoven et al [ 17 ] showed that a regularized least-squares (RLS) model was able to predict binary drug–target interactions at almost perfect prediction accuracies when evaluated using a simple LOO-CV. Although RLS has proven to be an effective model in many applications [ 18 , 19 ], we argue that a part of this superior predictive power can be attributed to the oversimplified formulation of the drug–target prediction problem, as well as unrealistic evaluation of the model performance. Another source of potential bias is that simple cross-validation (CV) cannot evaluate the effect of adjusting the model parameters, and may therefore easily lead to selection bias and overoptimistic prediction results [ 20–22 ].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Toward more realistic drug-target interaction predictions

Pahikkala¹,

Airola²,

Pietilä³

et al. 2014

Briefings in Bioinformatics

406

459

View full text Add to dashboard Cite

A number of supervised machine learning models have recently been introduced for the prediction of drug–target interactions based on chemical structure and genomic sequence information. Although these models could offer improved means for many network pharmacology applications, such as repositioning of drugs for new therapeutic uses, the prediction models are often being constructed and evaluated under overly simplified settings that do not reflect the real-life problem in practical applications. Using quantitative drug–target bioactivity assays for kinase inhibitors, as well as a popular benchmarking data set of binary drug–target interactions for enzyme, ion channel, nuclear receptor and G protein-coupled receptor targets, we illustrate here the effects of four factors that may lead to dramatic differences in the prediction results: (i) problem formulation (standard binary classification or more realistic regression formulation), (ii) evaluation data set (drug and target families in the application use case), (iii) evaluation procedure (simple or nested cross-validation) and (iv) experimental setting (whether training and test sets share common drugs and targets, only drugs or targets or neither). Each of these factors should be taken into consideration to avoid reporting overoptimistic drug–target interaction prediction results. We also suggest guidelines on how to make the supervised drug–target interaction prediction studies more realistic in terms of such model formulations and evaluation setups that better address the inherent complexity of the prediction task in the practical applications, as well as novel benchmarking data sets that capture the continuous nature of the drug–target interactions for kinase inhibitors.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Toward more realistic drug-target interaction predictions

Pahikkala¹,

Airola²,

Pietilä³

et al. 2014

Briefings in Bioinformatics

406

459

View full text Add to dashboard Cite

show abstract

“…Regularized least-square (RLS) is an efficient model used in different types of applications (Pahikkala et al, 2012a,b). Van Laarhoven et al (2011) used RLS for the binary prediction of DTIs and achieved outstanding performance.…”

Section: Computational Prediction Of Drug-target Binding Affinitiesmentioning

confidence: 99%

Comparison Study of Computational Prediction Tools for Drug-Target Binding Affinities

et al. 2019

View full text Add to dashboard Cite

The drug development is generally arduous, costly, and success rates are low. Thus, the identification of drug-target interactions (DTIs) has become a crucial step in early stages of drug discovery. Consequently, developing computational approaches capable of identifying potential DTIs with minimum error rate are increasingly being pursued. These computational approaches aim to narrow down the search space for novel DTIs and shed light on drug functioning context. Most methods developed to date use binary classification to predict if the interaction between a drug and its target exists or not. However, it is more informative but also more challenging to predict the strength of the binding between a drug and its target. If that strength is not sufficiently strong, such DTI may not be useful. Therefore, the methods developed to predict drug-target binding affinities (DTBA) are of great value. In this study, we provide a comprehensive overview of the existing methods that predict DTBA. We focus on the methods developed using artificial intelligence (AI), machine learning (ML), and deep learning (DL) approaches, as well as related benchmark datasets and databases. Furthermore, guidance and recommendations are provided that cover the gaps and directions of the upcoming work in this research area. To the best of our knowledge, this is the first comprehensive comparison analysis of tools focused on DTBA with reference to AI/ML/DL.

show abstract

“…When this is not computationally feasible, one may approximate full LPO-SCV by randomly sampling a subset of all the possible pairs. Further, for ridge regression classifiers, fast LPO-SCV can be implemented using the fast holdout algorithms (Pahikkala et al 2012) implemented in the RLScore open source library (Pahikkala and Airola 2016).…”

Section: Spatial Leave-pair-out CVmentioning

confidence: 99%

The spatial leave-pair-out cross-validation method for reliable AUC estimation of spatial classifiers

Airola

Pohjankukka

Torppa

et al. 2018

Data Min Knowl Disc

Self Cite

View full text Add to dashboard Cite

Machine learning based classification methods are widely used in geoscience applications, including mineral prospectivity mapping. Typical characteristics of the data, such as small number of positive instances, imbalanced class distributions and lack of verified negative instances make ROC analysis and cross-validation natural choices for classifier evaluation. However, recent literature has identified two sources of bias, that can affect reliability of area under ROC curve estimation via cross-validation on spatial data. The pooling procedure performed by methods such as leave-one-out can introduce a substantial negative bias to results. At the same time, spatial dependencies leading to spatial autocorrelation can result in overoptimistic results, if not corrected for. In this work, we introduce the spatial leave-pair-out cross-validation method, that corrects for both of these biases simultaneously. The methodology is used to benchmark a number of classification methods on mineral prospectivity mapping data from the Central Lapland greenstone belt. The evaluation highlights the dangers of obtaining misleading results on spatial data and demonstrates how these problems can be avoided. Further, the results show the advantages of simple linear models for this classification task.

show abstract

Efficient cross-validation for kernelized least-squares regression with sparse basis expansions

Cited by 11 publications

References 28 publications

Toward more realistic drug-target interaction predictions

Toward more realistic drug-target interaction predictions

Comparison Study of Computational Prediction Tools for Drug-Target Binding Affinities

The spatial leave-pair-out cross-validation method for reliable AUC estimation of spatial classifiers

Contact Info

Product

Resources

About