Since increasing food demand and continuous reduction of available farmland, reliable and near-real-time wheat yield forecasts are essential to ensure regional and global food supplies. Although the crop model has been widely used in yield estimation, its applicability in large-scale yield prediction is limited due to the large amount of data required for parameterization. We took the main winter wheat growing areas in China and developed an ensemble learning framework based on seven machine learning algorithms, such as extreme gradient boosting, random forest, and support vector regression. The model used satellite vegetation index time series, climate, soil properties, and elevation data to provide county-level winter wheat yield forecasts from 2001 to 2015. The results showed that the ensemble explained 86% of the yield variability, which outperformed all base learners. By calculating the correlation between the prediction results of the base learners, we believed that the prediction performance of ensemble learning still has the potential for improvement. Soil properties and elevation data effectively improved the performance of the model because they contained information about yield prediction that could not be fully captured by vegetation index and climate data. As the growing season went on, the unique contribution of increasing climate data to yield forecasts was always more than that of vegetation index, especially in the early growing season. Furthermore, we evaluated the model’s ability to perform within-season prediction, and the model achieved satisfactory prediction accuracy 2 months before harvest (R2 = 0.85, RMSE = 480 kg/ha, MAPE = 7.52%). The framework of yield forecast established in this research can be applied to other crop varieties and regions and provide stakeholders with sufficiently accurate yield predictions.