BackgroundThe pathological stage of colon cancer cannot accurately predict recurrence, and to date, no gene expression characteristics have been demonstrated to be reliable for prognostic stratification in clinical practice, perhaps because colon cancer is a heterogeneous disease. The purpose was to establish a comprehensive molecular classification and prognostic marker for colon cancer based on invasion-related expression profiling.MethodsFrom the Gene Expression Omnibus (GEO) database, we collected two microarray datasets of colon cancer samples, and another dataset was obtained from The Cancer Genome Atlas (TCGA). Differentially expressed genes (DEGs) further underwent univariate analysis, least absolute shrinkage, selection operator (LASSO) regression analysis, and multivariate Cox survival analysis to screen prognosis-associated feature genes, which were further verified with test datasets.ResultsTwo molecular subtypes (C1 and C2) were identified based on invasion-related genes in the colon cancer samples in TCGA training dataset, and C2 had a good prognosis. Moreover, C1 was more sensitive to immunotherapy. A total of 1,514 invasion-related genes, specifically 124 downregulated genes and 1,390 upregulated genes in C1 and C2, were identified as DEGs. A four-gene prognostic signature was identified and validated, and colon cancer patients were stratified into a high-risk group and a low-risk group. Multivariate regression analyses and a nomogram indicated that the four-gene signature developed in this study was an independent predictive factor and had a relatively good predictive capability when adjusting for other clinical factors.ConclusionThis research provided novel insights into the mechanisms underlying invasion and offered a novel biomarker of a poor prognosis in colon cancer patients.