Objective: To investigate the tongue image features of patients with lung cancer and benign pulmonary nodules and to construct a lung cancer risk warning model using machine learning methods.Methods: From July 2020 to March 2022, we collected 862 participants including 263 patients with lung cancer, 292 patients with benign pulmonary nodules, and 307 healthy subjects. The TFDA-1 digital tongue diagnosis instrument was used to capture tongue images, using feature extraction technology to obtain the index of the tongue images. The statistical characteristics and correlations of the tongue index were analyzed, and six machine learning algorithms were used to build prediction models of lung cancer based on different data sets.Results: Patients with benign pulmonary nodules had different statistical characteristics and correlations of tongue image data than patients with lung cancer. Among the models based on tongue image data, the random forest prediction model performed the best, with a model accuracy of 0.679 ± 0.048 and an AUC of 0.752 ± 0.051. The accuracy for the logistic regression, decision tree, SVM, random forest, neural network, and naïve bayes models based on both the baseline and tongue image data were 0.760 ± 0.021, 0.764 ± 0.043, 0.774 ± 0.029, 0.770 ± 0.050, 0.762 ± 0.059, and 0.709 ± 0.052, respectively, while the corresponding AUCs were 0.808 ± 0.031, 0.764 ± 0.033, 0.755 ± 0.027, 0.804 ± 0.029, 0.777 ± 0.044, and 0.795 ± 0.039, respectively.Conclusion: The tongue diagnosis data under the guidance of traditional Chinese medicine diagnostic theory was useful. The performance of models built on tongue image and baseline data was superior to that of the models built using only the tongue image data or the baseline data. Adding objective tongue image data to baseline data can significantly improve the efficacy of lung cancer prediction models.