As the pace of global integration increases, so does the demand for English language courses. Due to the scarcity of English-language learning resources in China, students of the language often need help to improve their spoken English. Advances in artificial intelligence technology and language education approaches have created an entirely novel phase of language teaching and learning. To solve this issue, we can employ deep learning (DL) technology. Speech recognition software is the foundation of verbal communication instruction and is also used as an evaluation tool. More hardware, software, and algorithms are needed to analyze speech signals because of the complexity of speech pronunciation variations, the quantity of speech signal data, a amount of speech characteristics parameters, and the size of speech gratitude and assessment computation. However, it is challenging to increase the precision and speed of conventional speech recognition algorithms since they have run across previously unheard-of bottlenecks. This article focuses on examining the impact of college English multimedia instruction in order to address these issues. The EMLP-SNN technique, which improves multilayer perceptron integration with spiking neural networks, is suggested for identifying oral English pronunciation. The results of the experiments demonstrate that the proposed algorithm has provided an accuracy of 97.5%, which can help students identify discrepancies between their pronunciation and the norm and fix pronunciation mistakes, leading to enhanced oral English learning performance. Povzetek: Raziskava uvaja EMLP-SNN tehniko za izboljšanje identifikacije angleške izgovorjave s 97,5% točnostjo, kar omogoča študentom izboljšanje učenja govornega angleškega jezika.