In the middle and late stages of heavy oil development, formulating a scientific and reasonable mining plan is the key to improving oilfield efficiency. At present, steam stimulation is still the main development method of heavy oil. The determination of its production is not only limited by boiler conditions, surface pipelines, and wellbore conditions but also by the steam absorption capacity of the formation. Therefore, local analysis cannot achieve the best effect in the whole process of steam stimulation. The mechanism model is the most commonly used method to predict heavy oil production, but too many idealized assumptions make the prediction results quite different from the actual production situation. With the rapid development of machine learning, people can achieve rapid prediction of production through field data. However, when the range of the actual parameter is small, the generalization ability of the model is weak and overfitting occurs. Based on the above background, this paper conducts a coupling study on surface steam pipeline flow, steam injection wellbore flow, and formation flow from the perspective of data-driven. Firstly, based on the correlation coefficient and the feature selection of Random Forest, the importance of the characteristics affecting liquid production and water content was ranked. Secondly, through the comparison of five typical machine learning algorithms, we select the optimal prediction model and optimal characteristics suitable for the sample of this paper. Finally, because of the poor generalization ability of the prediction model, we sampled the mechanism model and increased the diversity of steam dryness samples. We find that the accuracy of the optimal prediction model is improved and the generalization ability of the model is improved after the training of new samples. This paper provides a new idea for the production prediction of heavy oil steam stimulation reservoirs, which is helpful for the efficient development of heavy oil reservoirs.