BackgroundThe liver is the most common site of distant metastasis in rectal cancer, and liver metastasis dramatically affects the treatment strategy of patients. This study aimed to develop and validate a clinical prediction model based on machine learning algorithms to predict the risk of liver metastasis in patients with rectal cancer.MethodsWe integrated two rectal cancer cohorts from Surveillance, Epidemiology, and End Results (SEER) and Chinese multicenter hospitals from 2010-2017. We also built and validated liver metastasis prediction models for rectal cancer using six machine learning algorithms, including random forest (RF), light gradient boosting (LGBM), extreme gradient boosting (XGB), multilayer perceptron (MLP), logistic regression (LR), and K-nearest neighbor (KNN). The models were evaluated by combining several metrics, such as the area under the curve (AUC), accuracy score, sensitivity, specificity and F1 score. Finally, we created a network calculator using the best model.ResultsThe study cohort consisted of 19,958 patients from the SEER database and 924 patients from two hospitals in China. The AUC values of the six prediction models ranged from 0.70 to 0.95. The XGB model showed the best predictive power, with the following metrics assessed in the internal test set: AUC (0.918), accuracy (0.884), sensitivity (0.721), and specificity (0.787). The XGB model was assessed in the outer test set with the following metrics: AUC (0.926), accuracy (0.919), sensitivity (0.740), and specificity (0.765). The XGB algorithm also shows a good fit on the calibration decision curves for both the internal test set and the external validation set. Finally, we constructed an online web calculator using the XGB model to help generalize the model and to assist physicians in their decision-making better.ConclusionWe successfully developed an XGB-based machine learning model to predict liver metastasis from rectal cancer, which was also validated with a real-world dataset. Finally, we developed a web-based predictor to guide clinical diagnosis and treatment strategies better.