Background
The aim of this study is to delve into the potential value of metabolism-related genes in the prognosis assessment of cancer. By analyzing transcriptomics data and clinical information of various types of cancer from public databases, we screen out metabolism-related genes associated with prognosis and construct a prognostic model, offering new solutions for the prognosis assessment and personalized treatment of cancer patients.
Methods
Initially, we obtain metabolomics data and clinical information of various types of cancer from public databases (such as TCGA, GTEx, UCSC), including gene expression data, patient survival information, etc. Subsequently, we acquire a list of metabolism-related genes from the KEGG database and match it with the gene expression data in cancer samples to screen out differentially expressed metabolism-related genes. We then use univariate Cox regression analysis to analyze prognosis-related genes and employ LASSO and random survival forest algorithms for feature selection, choosing the most important metabolic features. Based on the selected metabolic features, we construct a prognostic model using various machine learning algorithms, including The NonLinear CoxPH, Extra Survival Trees, etc., and optimize the parameters. Finally, we apply the constructed pan-cancer prognostic model to datasets of other types of cancer for validation and performance evaluation.
Results
In HCC, we identified 407 differentially expressed genes related to metabolism. After Cox testing and prognosis-related analysis, we screened out 561 differentially expressed genes related to prognosis, and used random forest and LASSO regression algorithms to select the most important features, ultimately obtaining 7 metabolic features with significant predictive power. Subsequently, we reconstructed the random survival forest model based on these 7 metabolic features and verified the predictive performance of the model by drawing ROC curves (1–5 year AUC value > 0.89). The application results of the prognostic model in pan-cancer showed that it exhibited good predictive effects in 10 of the 33 types of cancer in the TCGA database (C-index > 0.75, IBS < 0.25), proving the potential value of metabolic features as prognostic markers in cancer.
Conclusion
This study constructs an effective pan-cancer prognostic model through comprehensive analysis of metabolomics data and clinical information in public databases, which can predict the prognosis of cancer patients. At the same time, we observed variations in several metabolic features among different types of cancer, offering new insights into predicting molecular subtypes and responses to diverse treatment plans. The findings from this study serve as a reference for individualized treatment decisions and precision medicine for cancer patients, while also contributing novel ideas and methods to advance the field of metabolomics.