Background
For a long time, breast cancer has been a leading cancer diagnosed in women worldwide, and approximately 90% of cancer-related deaths are caused by metastasis. For this reason, finding new biomarkers related to metastasis is an urgent task to predict the metastatic status of breast cancer and provide new therapeutic targets.
Methods
In this research, an efficient model of eXtreme Gradient Boosting (XGBoost) optimized by a grid search algorithm is established to realize auxiliary identification of metastatic breast tumors based on gene expression. Estimated by ten-fold cross-validation, the optimized XGBoost classifier can achieve an overall higher mean AUC of 0.82 compared to other classifiers such as DT, SVM, KNN, LR, and RF.
Results
A novel 6-gene signature (SQSTM1, GDF9, LINC01125, PTGS2, GVINP1, and TMEM64) was selected by feature importance ranking and a series of in vitro experiments were conducted to verify the potential role of each biomarker. In general, the effects of SQSTM in tumor cells are assigned as a risk factor, while the effects of the other 5 genes (GDF9, LINC01125, PTGS2, GVINP1, and TMEM64) in immune cells are assigned as protective factors.
Conclusions
Our findings will allow for a more accurate prediction of the metastatic status of breast cancer and will benefit the mining of breast cancer metastasis-related biomarkers.