At present, most robot dances are precompiled. Changing music requires manual adjustment of relevant parameters and metamovements, which greatly reduces the fun and intelligence. In view of the above problems, this paper designed CNN system, studied the multimodal dance movement recognition algorithm of artificial intelligence image technology, and completed the construction of a multimodal dance movement calculation system example. The results show that the CNN algorithm and the Winograd algorithm-based coprocessor-optimized CNN network in multimodal dance movement recognition with image technology reduce from a maximum of 132s to 26s in the runtime criterion, with a maximum reduction of 80%; from a maximum of 73.5% to 16.2% in the memory access criterion, with a maximum reduction of 57.3%; and from a maximum of 93.6% to 25.2% in the power consumption ratio criterion, with a maximum reduction of 68.4%. In the power consumption ratio criterion, the maximum reduction from 93.6% to 25.2% is 68.4%. The maximum accuracy of the proposed optimization method is 95.1%. The solution is proposed to address the problem of insufficient performance of traditional dance movement recognition, which will contribute to the development of artificial intelligence and dance industry.