With the breakthroughs in deep learning technology in image processing and language models, its potential application in the educational domain is gradually being unlocked. Particularly, in the understanding and analysis of educational image content, deep learning paves a new path for recommending personalized learning trajectories. This study aims to construct a system that interprets educational image content using deep learning technology and recommends personalized learning paths based on this content. Initially, an end-to-end visual narrative framework that integrates the Bidirectional Encoder Representations from Transformer (BERT) model, attention mechanisms, and hierarchical Long Short-Term Memory (LSTM) models is proposed to enhance the depth of understanding of educational image content. Subsequently, a recommendation model based on multi-feature Latent Dirichlet Allocation (LDA) is developed, facilitating the learning of correspondences among various features across different educational images, thereby promoting accurate recommendations of personalized learning paths. Existing research commonly overlooks the comprehensive consideration of semantic layers of images and educational backgrounds; this method is designed to bridge that gap. Results indicate that the system is capable of effectively understanding educational image content and providing precise learning path recommendations based on learner characteristics, promising to significantly improve learning efficiency and quality.