Lung cancer detection using machine learning involves training a model on a dataset of medical images, such as CT scans, to identify patterns and features associated with lung cancer. Past researchers developed different computer aided diagnostic (CAD) systems for early prediction of lung cancer. The researchers extracted single features such as texture, morphology etc.; however, by combining the features, accuracy can be improved. In this study, we extracted Gray-level co-occurrence (GLCM), autoencoder and Haralick texture features. We combined these features and computed the performance using robust machine algorithms including Decision tree (DT), Naïve Bayes (NB) and support vector machine (SVM) with different kernel functions. The performance was evaluated using standard performance measures. The hybrid methods such as GLCM + Autoencoder, and Haralick + Autoencoder yielded highest detection performance using SVM Gaussian and radial base function (RBF) with sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) with accuracy of 100% and AUC 1.00 followed by SVM polynomial yielded an accuracy of 99.89% and AUC of 1.00; GLCM + Haralick using SVM Gaussian yielded accuracy (99.56%), SVM RBF yielded accuracy (99.35%). The results reveal that the proposed feature extraction methodology can be usefully used to predict the lung cancer for further diagnosis at early stage.
MSC: Artificial Intelligence, Machine Learning, Lung Cancer, cross validation