Background
Stomach adenocarcinoma (STAD) is a highly heterogeneous disease and is among the leading causes of cancer-related death worldwide. At present, TNM stage remains the most effective prognostic factor for STAD. Exploring the changes in gene expression levels associated with TNM stage development may help oncologists to better understand the commonalities in the progression of STAD and may provide a new way of identifying early-stage STAD so that optimal treatment approaches can be provided.
Methods
The RNA profile retrieving strategy was utilized and RNA expression profiling was performed using two large STAD microarray databases (GSE62254, n = 300; GSE15459, n = 192) from the Gene Expression Omnibus (GEO) and the RNA-seq database within the Cancer Genome Atlas (TCGA, n = 375). All sample expression information was obtained from STAD tissues after radical resection. After excluding data with insufficient staging information and lymph node number, samples were grouped into earlier-stage and later-stage. Samples in GSE62254 were randomly divided into a training group (n = 172) and a validation group (n = 86). Differentially expressed genes (DEGs) were selected based on the expression of mRNAs in the training group and the TCGA group (n = 156), and hub genes were further screened by least absolute shrinkage and selection operator (LASSO) logistic regression. Receiver operating characteristic (ROC) curves were used to evaluate the performance of the hub genes in distinguishing STAD stage in the validation group and the GSE15459 dataset. Univariate and multivariate Cox regressions were performed sequentially.
Results
22 DEGs were commonly upregulated (n = 19) or downregulated (n = 3) in the training and TCGA datasets. Nine genes, including MYOCD, GHRL, SCRG1, TYRP1, LYPD6B, THBS4, TNFRSF17, SERPINB2, and NEBL were identified as hub genes by LASSO-logistic regression. The model achieved discrimination in the validation group (AUC = 0.704), training-validation group (AUC = 0.743), and GSE15459 dataset (AUC = 0.658), respectively. Gene Set Enrichment Analysis (GSEA) was used to identify the potential stage-development pathways, including the PI3K-Akt and Calcium signaling pathways. Univariate Cox regression indicated that the nine-gene score was a significant risk factor for overall survival (HR = 1.28, 95% CI 1.08–1.50, P = 0.003). In the multivariate Cox regression, only SCRG1 was an independent prognostic predictor of overall survival after backward stepwise elimination (HR = 1.21, 95% CI 1.11–1.32, P < 0.001).
Conclusion
Through a series of bioinformatics and validation processes, a nine-gene signature that can distinguish STAD stage was identified. This gene signature has potential clinical application and may provide a novel approach to understanding the progression of STAD.