In this paper, the Internet of Things (IoT) with intelligent face perception and processing function is used to supervise online English teaching. In the intelligent learning environment, learners mainly learn by watching the information presentation screen of the learning content, i.e., the learning screen, which is the main environment for learners to learn and is the main channel for information interaction between learners and the learning content. The color matching, layout, graphic decoration, and background texture of the learning screen have a significant impact on learners’ emotions, interests, motivation, and effect in the learning process. On the contrary, the accurate identification of learners’ emotions is the basis for building a harmonious emotional interaction in the wisdom learning environment and is an important means to judge learners’ learning status, which is of great significance to promote learners’ wisdom learning. In addition to providing learners with personalized learning contents and learning paths, the learning images presented by the intelligent learning environment should also be compatible with learners’ emotional states and visual emotional preferences and can play a role in regulating and stimulating learners’ learning emotions. The system works well in the testing process, which verifies the feasibility, rationality, and effectiveness of our application of face perception to online teaching effectiveness monitoring, and can be combined with the old result-oriented effectiveness monitoring method for online teaching, with certain theoretical research significance and practical application value.