Accurately estimating regional‐scale crop yields is substantial in determining current agricultural production performance and effective agricultural land management. The Yuncheng Basin is an important grain‐producing area in the Shanxi Province. This paper used Sentinel 2A with a spatial resolution of 10 m and MODIS with a temporal resolution of 1 d in 2020. The spatial and temporal nonlocal filter‐based fusion model (STNLFFM) was used to obtain fused data with a spatial resolution of 10 m and a temporal resolution of 1 d, combined with the Carnegie–Ames–Stanford Approach (CASA) and light‐use efficiency model to achieve summer maize (Zea mays L.) yield estimation. The results showed that the fused normalized difference vegetation index (NDVI) could inherit the spatial Sentinel‐2A NDVI details and express the spatial differences between smaller features more effectively. The STNLFFM NDVI curve was consistent with the actual summer maize growth condition, which accurately reflects the NDVI trend and local abrupt change information during the summer maize growth period. Moreover, the fused NDVI was influenced by topographic differences and artificial irrigation factors, whereas the summer maize yield in mountainous and plateau areas of the Yuncheng Basin was <5,000 kg ha−1 and those in the alluvial plain of the Sushui River reached 8,000 kg ha−1. The accuracy of the yield estimation model constructed based on STNLFFM NDVI (mean absolute percentage error [MAPE] = 5.47%, −13.74% ≤ relative error [RE] ≤0.12%) was significantly higher than that of the model based on MODIS NDVI (MAPE = 15.65%, −19.67% ≤ RE ≤ 20.88%), indicating that the use of spatio‐temporal fusion technology can effectively improve the summer maize yield estimation accuracy.