This project uses Openfire to implement a virtual 3D animation instant messaging system, which is easier to use and more expandable. The main work of the client is to implement the Extensible Messaging and Presence Protocol (XMPP) and use XMPP to transmit data to the server side and receive data from the server side, while Openfire is built by the server side to use. To address the problem that the current mainstream face key point localization model is less robust to complex environments, this project adopts a deep learning-based approach to design and implement the face key point localization model, through data preprocessing, model design, and model training, to achieve a robust model that can locate 68 face key points and complete the migration of the model to mobile. The current video communication often suffers from delay and lag, so this project uses face key point data instead of video stream data transmission to reduce the pressure on the network. This topic also uses voice coding and decoding, noise reduction, echo cancellation, and other processing to solve the problems of noise interference and echo interference in voice transmission. This paper also introduces the creation, import, and loading of 3D virtual models, and explains how to use face key point association to drive 3D animation models, how to make the drive smoother and more natural, and using individual face key points as an example.