With the impact of COVID-19 on education, online education is booming, enabling learners to access various courses. However, due to the overload of courses and redundant information, it is challenging for users to quickly locate courses they are interested in when faced with a massive number of courses. To solve this problem, we propose a deep course recommendation model with multimodal feature extraction based on the Long- and Short-Term Memory network (LSTM) and Attention mechanism. The model uses course video, audio, and title and introduction for multimodal fusion. To build a complete learner portrait, user demographic information, explicit and implicit feedback data were added. We conducted extensive and exhaustive experiments based on real datasets, and the results show that the AUC obtained a score of 79.89%, which is significantly higher than similar algorithms and can provide users with more accurate recommendation results in course recommendation scenarios.