Early and accurate prediction and simulation of grain crop yield can help maximize the revision and development of regional food policy, which is crucial for ensuring national food security. The development of unmanned aerial vehicle (UAV) technology is gradually gaining an advantage over satellite remote sensing at the field scale. In this study, we predicted maize yield using canopy vegetation indices (VIs) and crop phenology metrics obtained through UAV with ordinary least squares (OLS), stepwise multiple linear regression (SMLR) and gradient‐boosted regression tree (GBRT). The results reveal that the VIs extracted from UAV imagery had a high correlation with yield (R = 0.92), facilitating crop yield estimation. Additionally, coupling crop phenology significantly improved the prediction accuracy of SMLR, with the highest R2 and lowest RMSE of 0.894, 1.238 × 103 kg ha−1, respectively. But, the enhancement of GBRT by this method was slender. Its simulation outperformed OLS and SMLR with dramatic R2, RMSE, and MAE of 0.892, 1.189 × 103 kg ha−1, and 9.150 × 102 kg ha−1, respectively. Moreover, the blister stage was deemed the optimal stage for maize yield prediction with an accuracy rate exceeding 81%. These demonstrated the feasibility of using UAV images to predict crop yields, providing an important reference at the field scale.