AimsThis study aimed to investigate the distant metastasis pattern from newly diagnosed colorectal cancer (CRC) and also construct and validate a prognostic nomogram to predict both overall survival (OS) and cancer-specific survival (CSS) of CRC patients with distant metastases.MethodsPrimary CRC patients who were initially diagnosed from 2010 to 2016 in the SEER database were included in the analysis. The independent risk factors affecting the OS, CSS, all-cause mortality, and CRC-specific mortality of the patients were screened by the Cox regression and Fine–Gray competitive risk model. The nomogram models were constructed to predict the OS and CSS of the patients. The reliability and accuracy of the prediction model were evaluated by consistency index (C-index) and calibration curve. The gene chip GSE41258 was downloaded from the GEO database, and differentially expressed genes (DEGs) were screened by the GEO2R online tool (p < 0.05, |logFC|>1.5). The Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway and Gene Ontology (GO) annotation and String website were used for enrichment analysis and protein–protein interaction (PPI) analysis of DEGs, respectively, and Cytoscape software was used to construct PPI network and screen function modules and hub genes.ResultsA total of 57,835 CRC patients, including 47,823 without distant metastases and 10,012 (17.31%) with metastases, were identified. Older age, unmarried status, poorly differentiated or undifferentiated grade, right colon site, larger tumor size, N2 stage, more metastatic sites, and elevated carcinoembryonic antigen (CEA) might lead to poorer prognosis (all p < 0.01). The independent risk factors of OS and CSS were included to construct a prognosis prediction model for predicting OS and CSS in CRC patients with distant metastasis. C-index and calibration curve of the training group and validation group showed that the models had acceptable predictive performance and high calibration degree. Furthermore, by comparing CRC tissues with and without liver metastasis, 158 DEGs and top 10 hub genes were screened. Hub genes were mainly concentrated in liver function and coagulation function.ConclusionThe big data in the public database were counted and transformed into a prognostic evaluation tool that could be applied to the clinic, which has certain clinical significance for the formulation of the treatment plan and prognostic evaluation of CRC patients with distant metastasis.