Background: Pancreatic ductal adenocarcinoma (PDAC) is associated with a very poor prognosis. Therefore, there has been a focus on the identification of new biomarkers for the early diagnosis of PDAC and prediction of patient survival. Genome-wide RNA and microRNA sequencing were used using bioinformatics and Machine Learning approaches to identify differentially expressed genes (DEGs) followed by validation in additional cohort of PDAC patients.
Methods: genome RNA sequencing and clinical data from pancreatic cancer patients were extracted from The Cancer Genome Atlas Database (TCGA) to identify DEGs. We used Kaplan-Meier analysis of survival curves was used to assess prognostic biomarkers. Ensemble learning, Random Forest, (RF), Max Voting, Adaboost, Gradient boosting machines (GBM) and Extreme Gradient Boosting (XGB) techniques were used and Gradient boosting machines (GBM) were selected with 100 % accuracy for analysis. Moreover, protein-protein interaction (PPI), molecular pathways, concomitant expression of DEGs, and correlations between DEGs and clinical data were analyzed. We have evaluated candidate genes, miRNAs and a combination of these obtained from machine learning algorithms and survival analysis.
Results: Machine learning results showed 23 genes with negative regulation, 5 genes with positive regulation, 7 microRNAs with negative regulation and 20 microRNAs with positive regulation in PDAC. Key genes BMF, FRMD4A, ADAP2, PPP1R17, and CACNG3 had the highest coefficient in the advanced stages of disease. In addition, the survival analysis results showed decreased expression of hsa.miR.642a, hsa.mir.363, CD22, BTNL9 and CTSW and overexpression of hsa.miR.153.1, hsa.miR.539, hsa.miR.412 reduced survival rate. CTSW was identified as a novel genetic marker and this was validated using RT-PCR.
Conclusion: Machine learning algorithms may be used to Identify key dysregulated genes/miRNAs involved in pathogenesis of the diseases can be used for detection of patients in earlier stages. Our data also demonstrated the prognostic and diagnostic value of CTSW in PDAC.