Lung cancers with a mutated epidermal growth factor receptor (EGFR) are a major contributor to cancer fatalities globally. Targeted tyrosine kinase inhibitors (TKIs) have been developed against EGFR and show encouraging results for survival rate and quality of life. However, drug resistance may affect treatment plans and treatment efficacy may be lost after about a year. Predicting the response to EGFR-TKIs for EGFR-mutated lung cancer patients is a key research area. In this study, we propose a personalized drug response prediction model (PDRP), based on molecular dynamics simulations and machine learning, to predict the response of first generation FDA-approved small molecule EGFR-TKIs, Gefitinib/Erlotinib, in lung cancer patients. The patient’s mutation status is taken into consideration in molecular dynamics (MD) simulation. Each patient’s unique mutation status was modeled considering MD simulation to extract molecular-level geometric features. Moreover, additional clinical features were incorporated into machine learning model for drug response prediction. The complete feature set includes demographic and clinical information (DCI), geometrical properties of the drug-target binding site, and the binding free energy of the drug-target complex from the MD simulation. PDRP incorporates an XGBoost classifier, which achieves state-of-the-art performance with 97.5% accuracy, 93% recall, 96.5% precision, and 94% F1-score, for a 4-class drug response prediction task. We found that modeling the geometry of the binding pocket combined with binding free energy is a good predictor for drug response. However, we observed that clinical information had a little impact on the performance of the model. The proposed model could be tested on other types of cancers. We believe PDRP will support the planning of effective treatment regimes based on clinical-genomic information. The source code and related files are available on GitHub at: https://github.com/rizwanqureshi123/PDRP/.
Non-small cell lung cancer (NSCLC) is a major cause of death worldwide. About 80% to 85% of lung cancer cases are NSCLC. It is well known that mutation of the epidermal growth factor (EGFR) may lead to the NSCLC. The first generation drugs are effective initially, but almost all patients develop drug resistance after about a year due to a secondary mutation. The computational methods are an efficient tool for investigating drug resistance, design, and discovery. Moreover, molecular dynamics (MD) simulation enables us to study and analyze the behavior of proteins and molecules at the atomic level. MD simulations offer extraordinary insight about biomolecules and are a valuable tool for computer aided drug discovery. Earlier studies on EGFR only focused on the kinase domain. Because EGFR is a multi-domain protein, mutations in the kinase domain may affect the function in other domains. Therefore, it is important to investigate the complete structure of the EGFR and its mutants. In this paper, we first generate the complete structure of the EGFR and perform MD simulation for the wildtype EGFR, EGFR with L858R mutation and EGFR with L858R and T790M mutation. We divide the complete structure of the EGFR and its mutants into 8 domains according to the reference crystal structure. We then consider atom trajectories as time series signals and estimate the power spectral densities using the auto-regressive integrated (ARI) model, which shows interesting insight. Dynamic time warping is used to analyze the similarity between each domain of the structures. Interesting patterns are observed which may be useful for investigating drug resistance and design. Furthermore, Pearson correlation coefficient, peaks, and widths of the power spectral density are calculated for each domain. The simulation results provide useful insight about conformation dynamics of the EGFR, such as atom motion and protein stability. The domains are less correlated in L858R type and even weaker when the second mutation occurs. The warping patterns are changed due to mutation and the movement of atoms is distorted. Hence, it is difficult for a drug to bind to the protein. These findings will be useful in understanding the characteristics of the EGFR and for computer aided drug design process for the NSCLC patients. INDEX TERMS Autoregressive integrated model, drug resistance, dynamic time warping, epidermal growth factor receptor, molecular dynamics, non-small cell lung cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.