In the age of big data, visual communication has emerged as a critical means of engaging with customers. Among multiple modes of visual communication, digital animation advertising is an exceptionally potent tool. Advertisers can create lively and compelling ads by harnessing the power of digital animation technology. This article proposes a multimodal visual communication system (MVCS) model based on multimodal video emotion analysis. This model automatically adjusts video content and playback mode according to users’ emotions and interests, achieving more personalized video communication. The MVCS model analyses videos from multiple dimensions, such as vision, sound, and text, by training on a large-scale video dataset. We employ convolutional neural networks to extract the visual features of videos, while the audio and text features are extracted and analyzed for emotions using recurrent neural networks. By integrating feature information, the MVCS model can dynamically adjust the video’s playback mode based on users’ emotions and interaction behaviours, which increases its playback volume. We conducted a satisfaction survey on 106 digitally corrected ads created using the MVCS method to evaluate our approach’s effectiveness. Results showed that 92.6% of users expressed satisfaction with the adjusted ads, indicating the MVCS model’s efficacy in enhancing digital ad design effectiveness.