In this paper, the multimodal data of students’ learning behaviours are classified into multimodal data such as sound, image and text according to the characteristics of data types presented in the digital English teaching environment. According to the data modal characteristics and factor characteristics, the multimodal behavioral data of students’ English learning is preprocessed. In view of the algorithmic advantages of the Deep Mixed Discriminant Restricted Boltzmann Machine (HDRBM) neural network, the accurate learning evaluation annotation of learning behavioural features by HDRBM is established, constituting the learning behavioural feature extraction model based on HDRBM. The optimal setting value of HDRBM algorithm performance is determined through training tests to extract students’ digital English learning behavior features. The English learning performance levels are divided, and the correlation relationship between different learning behaviors and learning performance is examined. The digital English learning behaviours include attention (23.5 points), learning motivation (15.8 points), learning attitude (36.7 points), and learning strategies (24.9 points), among which the English learning motivation scores are low. There was a significant difference in the scores between the group with learning difficulties and the intermediate, good, and excellent groups. Students’ English learning achievement can be improved by adjusting their learning behaviors.