“…Cox R, Kaplan-Meier [13] SEER 2004-2016, # of variables is not disclosed, classification + regression for 3 categories: ≤6 months, 7-24 months, and ≥24 months ANN, RNN, CNN, RF, SVM, NB, GBM, LR [7] SEER 2010-2015, 12 variables, classification (1-, 3-, 5-year survival) XGB, LR, NB, DT, KNN, RF, SVM [14] SEER 2010-2015, 14 variables, classification (5-year survival) LR, NB, Gaussian K-base NB, [15] SEER 1973-2012, 114 variables, classification (0.5-, 1-, 5-year survival) RF, ANN Data Mining [16] SEER 2004-2009, 13 variables, classification + regression for 3 categories: ≤6 months, 7-24 months, and ≥24 months GBM, RF, GLM, EV Survival status prediction, length of survival estimation, and cancer patient clustering are primary topics found in the machine learning literature that utilizes the SEER dataset, where focus is placed on model accuracy. Moreover, common classification, clustering, and regression models employed within the second group of research include artificial neural networks (ANNs), support vector machines (SVMs), Naïve Bayes (NB), decision trees (DTs), random forest (RF), ensemble methods, K-means, and bidirectional data partitioning (BDP) [7,[13][14][15][16][17][18][19][20][21][22]. Apart from the great strides made in lung cancer prediction research, several challenges still exist:…”