This work proposes a novel framework of an explainable ensemble of neural networks to classify NSCLC samples into adenocarcinoma and squamous cell carcinoma, using DNA methylation data. The framework utilizes an ensemble of shallow neural networks and soft-voting decision fusion for classification. Subsequently, these neural networks are interpreted via SHapley Additive exPlanations (SHAP) to highlight the most relevant DNA methylation CpG probes. The proposed framework-based model achieves a classification accuracy of 0.989, outperforming other ensemble models such as XGboost, Random Forest, AdaBoost, CatBoost, and GradientBoosting. SHAP analysis reveals 702 relevant CpG probes, that are mapped to a set of 499 signature biomarkers. While 104 of these signature biomarkers are potentially druggable (using DGIdb database), 40 of them overlap with the OncoKB cancer genes list. In the future, the framework could be made robust enough to classify other carcinomas. Moreover, multiomics data-based classification could provide better accuracy and more stable biomarkers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.