Lung cancer is the leading cancer for causing death for both men and women. It also has one of the lowest survival rates in five-year of all cancer types. It remains a challenge to lung cancer relapse prediction after surgery, especially for non-small cell lung cancer (NSCLC). This study aimed to enhance prediction and detection using eXtreme Gradient Boosting (XGBoost) model to detect lung cancer diagnoses and predict its relapse after surgery by using gene expression and its transcriptome changes due to cancer. This can aid to enhance early tumour progression handling and reducing the painful treatment. In this study, it used real New Generation RNA_seq (NGS) and microarray gene expression datasets for different types of lung cancer. The results demonstrated the effectiveness of the XGBoost model compared to other machine learning models especially in handling unbalance datasets.
Lung cancer is one of the deadliest diseases in the world. Non-small cell lung cancer (NSCLC) is the most common and dangerous type of lung cancer. Despite the fact that NSCLC is preventable and curable for some cases if diagnosed at early stages, the vast majority of patients are diagnosed very late. Furthermore, NSCLC usually recurs sometime after treatment. Therefore, it is of paramount importance to predict NSCLC recurrence, so that specific and suitable treatments can be sought. Nonetheless, conventional methods of predicting cancer recurrence rely solely on histopathology data and predictions are not reliable in many cases. The microarray gene expression (GE) technology provides a promising and reliable way to predict NSCLC recurrence by analysing the GE of sample cells. This study proposes a new model from GE programming to use microarray datasets for NSCLC recurrence prediction. To this end, the authors also propose a hybrid method to rank and select relevant prognostic genes that are related to NSCLC recurrence prediction. The proposed model was evaluated on real NSCLC microarray datasets and compared with other representational models. The results demonstrated the effectiveness of the proposed model.
Most lung cancers do not cause symptoms until the disease is in its later stage. That led the lung cancer having a high fatality rate compared to other cancer types. Many scientists try to use artificial intelligence algorithms to produce accurate lung cancer detection. This paper used extreme gradient boosting (XGBoost) models as a base model for its effectiveness. It enhanced lung cancer detection performance by suggesting three stages model; feature stage, XGBooste parallel stage and selection stage. This study used two types of gene expression datasets; RNA-sequence and microarray profiles. The results presented the effectiveness of the proposed model, especially in dealing with imbalanced datasets, by having 100% each of sensitivity, specificity, precision, F1_score, area under curve (AUC), and accuracy metrics when it applied on all of the datasets used in this study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.