Metaverse is to build a virtual world that is both mapped and independent of the real world in cyberspace by using the improvement in the maturity of various digital technologies, such as virtual reality (VR), augmented reality (AR), big data, and 5G, which is important for the future development of a wide variety of professions, including education. The metaverse represents the latest stage of the development of visual immersion technology. Its essence is an online digital space parallel to the real world, which is becoming a practical field for the innovation and development of human society. The most prominent advantage of the English-teaching metaverse is that it can provide an immersive and interactive teaching field for teachers and students, simultaneously meeting the teaching and learning needs of teachers and students in both the physical world and virtual world. This study constructs experiential situational English-teaching scenario and convolutional neural networks (CNNs)–recurrent neural networks (RNNs) fusion models are proposed to recognize students’ emotion electroencephalogram (EEG) in experiential English teaching during the feature space of time domain, frequency domain, and spatial domain. Analyzing EEG data collected by OpenBCI EEG Electrode Cap Kit from students, experiential English-teaching scenario is designed into three types: sequential guidance, comprehensive exploration, and crowd-creation construction. Experimental data analysis of the three kinds of learning activities shows that metaverse-powered experiential situational English teaching can promote the improvement of students’ sense of interactivity, immersion, and cognition, and the accuracy and analysis time of CNN–RNN fusion model is much higher than that of baselines. This study can provide a nice reference for the emotion recognition of students under COVID-19.