Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. The major difficulty of this problem stems from the fact that the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Although feature selection is the key to interpretable results and identification of potential biomarkers, a comprehensive assessment of feature selection methods for drug sensitivity prediction has so far not been performed. We propose feature selection approaches driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, a panel of around 1000 cell lines screened against multiple anticancer compounds. We compare our results with a baseline model utilizing genome-wide gene expression features and common data-driven feature selection techniques. Together, 2484 unique models were evaluated, providing a comprehensive study of feature selection strategies for the drug response prediction problem. For 23 drugs, the models achieve better predictive performance when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r=0.75). Extending the drug-dependent features with gene expression signatures yields models that are most predictive of drug response for 60 drugs, with the best performing example of Dabrafenib. Examples of how pre-selection of features benefits the model interpretability are given for Dabrafenib, Linifanib and Quizartinib. Based on GDSC drug data, we find that feature selection driven by prior knowledge tends to yield better results for drugs targeting specific genes and pathways, while models with the genome-wide features perform better for drugs affecting general mechanisms such as metabolism and DNA replication. For a significant group of the compounds, even a very small number of features based on simple drug properties is often highly predictive of drug sensitivity, can explain the mechanism of drug action and be used as guidelines for their prescription. In general, choosing appropriate feature selection strategies has the potential to develop interpretable models that are indicative for therapy design.
Pharmacogenomics | Machine learning | Personalized medicine