Large-scale yield estimation in the field or plot during wheat grain filling can contribute to high-throughput plant phenotyping and precision agriculture. To overcome the challenges of poor yield estimation at a large scale and for multiple species, this study employed a combination of multispectral and RGB drones to capture images and generation of time-series data on vegetation indices and canopy structure information during the wheat grubbing period. Five machine learning methods, partial least squares, random forest, support vector regression machine, BP neural networks, and long and short-term memory networks were used. The yield estimation of wheat grain filling period data was executed using a long and short-term memory network based on the preferred machine learning model, with a particular focus on distinguishing different heat-tolerant genotypes of wheat. The results unveiled a declining trend in the spectral reflectance characteristics of vegetation indices as the filling period progressed. Among the time-series data of the wheat filling period, the long and short-term memory network exhibited the highest estimation effectiveness, surpassing the BP neural network, which displayed the weakest estimation performance, by an impressive improvement in R2 of 0.21. The three genotypes of wheat were categorized into heat-tolerant genotype, moderate heat-tolerant genotype, and heat-sensitive genotype. Subsequently, the long and short-term memory network, which exhibited the most accurate yield estimation effect, was selected for regression prediction. The results indicate that the yield estimation effect was notably better than that achieved without distinguishing genotypes. Among the wheat genotypes, the heat-sensitive genotype demonstrated the most accurate prediction with an R2 of 0.91 and RMSE% of 3.25%. Moreover, by fusing the vegetation index with canopy structure information, the yield prediction accuracy (R2) witnessed an overall enhancement of about 0.07 compared to using the vegetation index alone. This approach also displayed enhanced adaptability to spatial variation. In conclusion, this study successfully utilized a cost-effective UAV for data fusion, enabling the extraction of canopy parameters and the application of a long and short-term memory network for yield estimation in wheat with different heat-tolerant genotypes. These findings have significant implications for informed crop management decisions, including harvesting and contingency forecasting, particularly for vast wheat areas.