English listening is an effective way to improve students’ English expression ability and use oral communication. However, from the current situation of English teaching, the current English teaching methods are too single, and teachers do not focus on oral training in the classroom, resulting in low efficiency of classroom teaching. On the basis of following the principles of wholeness, interaction, balance, and sustainable development of educational ecology, by enhancing the synergy of ecological elements of English speaking classroom, promoting interactive dialogue among ecological subjects, and regulating classroom behaviors, it is conducive to giving full play to the advantageous role of information technology on English speaking teaching reform and promoting its sustainable development. This paper addresses the current situation of English listening teaching, especially the problem of reduced recognition rate of spoken language in noisy environment, and the principle of using dual-sensor speech recognition system proposed. We design the speech recognition method based on recurrent neural network by acquiring the weak vibration pressure speech signal of the jaw skin and the speech signal transmitted through the air during the vocalization process through the sensor. Deep machine learning algorithm is used for speech recognition in English teaching. A reasonable frame sampling frequency is set to obtain the English speech signal, then the feature parameters representing this speech signal are obtained by linear prediction coefficients, and the speech feature vector is generated, followed by the recurrent neural network algorithm to train the speech features. In the related experiments, by comparing with the commonly used speech recognition algorithms, it is proved that the proposed algorithm English teaching speech recognition has higher accuracy and faster convergence.