Assessing crop production in the field often requires breeders to wait until the end of the season to collect yield-related measurements, limiting the pace of the breeding cycle. Early prediction of crop performance can reduce this constraint by allowing breeders more time to focus on the highest-performing varieties. Here, we present a multimodal deep learning model for predicting the performance of maize (Zea mays) at an early developmental stage, offering the potential to accelerate crop breeding. We employed multispectral images and eight vegetation indices, collected by an uncrewed aerial vehicle approximately 60 days after sowing, over three consecutive growing cycles (2017, 2018 and 2019). The multimodal deep learning approach was used to integrate field management and genotype information with the multispectral data, providing context to the conditions that the plants experienced during the trial. Model performance was assessed using holdout data, in which the model accurately predicted the yield (RMSE 1.07 t/ha, a relative RMSE of 7.60% of 16 t/ha, and R2 score 0.73) and identified the majority of high-yielding varieties, outperforming previously published models for early yield prediction. The inclusion of vegetation indices was important for model performance, with a normalized difference vegetation index and green with normalized difference vegetation index contributing the most to model performance. The model provides a decision support tool, identifying promising lines early in the field trial.