Multicultural intermingling has become a major development trend in international society, and in this context, higher vocational English education takes the cultivation of students’ cross-cultural communicative competence as an important educational goal. In this paper, we use the YOFD network in machine vision technology to detect students’ faces in the English teaching classroom and solve the head pose rotation matrix by combining it with the solvePnP function in the OpenCV software library. Combining the results of students’ face and head posture detection, we detect changes in their concentration through their fatigue state and score their classroom concentration. The intercultural communication model proposes a 3-level multi-component competence structure, while the symbiotic texture of human-computer collaborative teaching constructs a human-computer symbiotic English teaching model for intercultural communication. The video data of students’ English classroom behaviors were obtained through a high-speed camera, and a human-computer symbiotic English teaching comparison experiment was designed. The ACC value of the YOFD model for students’ face detection was 95.36%, and the average values of yaw, pitch, and rotation angle errors for head posture detection ranged from 1.98° to 3.27°. The experimental group’s students outperformed the control group in English reading by 9.95 points, and their intercultural communication competence improved across all dimensions by 1.065 to 1.434 points. Relying on machine vision technology to assist English teachers in grasping students’ concentration in the classroom, combined with the human-computer symbiosis English teaching model, can enhance students’ intercultural communicative competence.