The thermal comfort of passengers in the carriage cannot be ignored. Thus, this research aims to establish a prediction model for the thermal comfort of the internal environment of a subway car and find the optimal input combination in establishing the prediction model of the predicted mean vote (PMV) index. Data-driven modeling utilizes data from experiments and questionnaires conducted in Nanjing Metro. Support vector machine (SVM), decision tree (DT), random forest (RF), and logistic regression (LR) were used to build four models. This research aims to select the most appropriate input variables for the predictive model. All possible combinations of 11 input variables were used to determine the most accurate model, with variable selection for each model comprising 102 350 iterations. In the PMV prediction, the RF model was the best when using the correlation coefficients square (R 2 / as the evaluation indicator (R 2 : 0.7680, mean squared error (MSE): 0.2868). The variables include clothing temperature (CT), convective heat transfer coefficient between the surface of the human body and the environment (CHTC), black bulb temperature (BBT), and thermal resistance of clothes (TROC). The RF model with MSE as the evaluation index also had the highest accuracy (R 2 : 0.7676, MSE: 0.2836). The variables include clothing surface area coefficient (CSAC), CT, BBT, and air velocity (AV). The results show that the RF model can efficiently predict the PMV of the subway car environment.