The accurate prediction of crop yields is crucial for enhancing agricultural efficiency and ensuring food security. This study assesses the performance of the CNN-LSTM-Attention model in predicting the yields of maize, rice, and soybeans in Northeast China and compares its effectiveness with traditional models such as RF, XGBoost, and CNN. Utilizing multi-source data from 2014 to 2020, which include vegetation indices, environmental variables, and photosynthetically active parameters, our research examines the model’s capacity to capture essential spatial and temporal variations. The CNN-LSTM-Attention model integrates Convolutional Neural Networks, Long Short-Term Memory, and an attention mechanism to effectively process complex datasets and manage non-linear relationships within agricultural data. Notably, the study explores the potential of using kNDVI for predicting yields of multiple crops, highlighting its effectiveness. Our findings demonstrate that advanced deep-learning models significantly enhance yield prediction accuracy over traditional methods. We advocate for the incorporation of sophisticated deep-learning technologies in agricultural practices, which can substantially improve yield prediction accuracy and food production strategies.