This paper adopts the multimodal approach of human-computer collaboration to conduct an in-depth study and analysis of the practical teaching model of preschool education, and applies the designed model to the actual teaching process. The application of multimodal theory to preschool teaching is chosen to theoretically help expand the research scope of multimodal theory and enrich the research of preschool teaching, and practically help break through the previous single-modal teaching model, further enrich the theoretical guidance of preschool teaching, and improve the quality of preschool classroom teaching. Then, from the perspective of human-machine synergy, this paper analyzes the advantages of artificial intelligence technology and teachers in the English classroom, puts forward the new roles of teachers and learners in the human-computer cooperation teaching environment, and discusses the significance and value of applying the four main modules of human-computer cooperation teaching, human-computer gesture mapping and human-computer cooperation manipulator control in the preschool classroom. According to the physical structure of hand joints, the human hand joint angles are obtained through the inverse kinematic solution, and the human hand joint angles correspond to the dexterous manipulator one by one so that the dexterous manipulator can be controlled to imitate the human hand to complete flexible gesture movements and realize the vision-based collaborative human-machine control of the dexterous manipulator. Combined with Gagne’s nine teaching events, a model of the English teaching process based on human-computer collaboration was constructed. Based on this model, the “EasyDotWise English Teaching System” was designed to combine the basic lesson types of preschool classroom teaching and the secondary objectives of the English curriculum standards, including “reading text–reading aloud evaluation,” “playing speech–sound recognition,” and “presenting text–selection.” We designed and implemented three types of teaching activities: “reading text–reading aloud assessment,” “playing phonetic sounds–sound identification,” and “presenting text–comprehension selection.”