BACKGROUND
Cancer is a life-threatening disease and a leading cause of death worldwide, with an estimated 611,000 deaths and over 2 million new cases in the United States in 2024. The rising incidence of major cancers, including among younger individuals, highlights the need for early screening and monitoring of risk factors to manage and decrease cancer risk.
OBJECTIVE
To identify pivotal factors essential for predicting the risk factors for four major cancer types (breast, colorectal, lung, and prostate) through the utilization of explainable machine learning techniques is imperative due to the increasing burden of cancer patients.
METHODS
De-identified electronic health record data from MIMIC-III was used to identify patients with four types of cancer who had longitudinal hospital visits prior to receiving a cancer diagnosis. Their records were matched and combined with those of patients without cancer diagnoses using propensity scores based on demographic factors. Three advanced models, penalized Logistic Regression (LR), Random Forest (RF), and Multilayer Perceptron (MLP), were conducted to identify the rank of risk factors for each cancer type, with feature importance analysis for RF and MLP models. The Rank Biased Overlap was adopted to compare the similarity of ranked risk factors across cancer types.
RESULTS
Our framework evaluated the prediction performance of explainable ML models, in which MLP achieved an AUC of 0.78 for breast cancer, 0.76 for colorectal cancer, 0.84 for lung cancer, and 0.78 for prostate cancer, respectively. In addition to demographic risk factors, the most prominent non-traditional risk factors overlapped across models and cancer types, including hyperlipidemia, diabetes, depressive disorders, heart diseases, and anemia. The similarity analysis indicated the unique risk factor pattern for lung cancer from other cancer types.
CONCLUSIONS
The study's findings demonstrate the effectiveness of explainable ML models in predicting non-traditional risk factors for major cancers and highlight the importance of considering unique risk profiles for different cancer types. These insights may contribute to efficient cancer screening and tailored cancer prevention strategies, which, in turn, offer fundamental support for clinical decision-making processes.