In accordance with the development trend of competitive aerobics’ arrangement structure, this paper studies the online arrangement method of difficult actions in competitive aerobics based on multimedia technology to improve the arrangement effect. RGB image, optical flow image, and corrected optical flow image are taken as the input modes of difficult action recognition network in competitive aerobics video based on top-down feature fusion. The key frames of input modes in competitive aerobics video are extracted by using the key frame extraction method based on subshot segmentation of a double-threshold sliding window and fully connected graph. Through forward propagation, the score vector of video relative to all categories is obtained, and the probability score of probability distribution is obtained after normalization. The human action recognition in competitive aerobics video is completed, and the online arrangement of difficult action in competitive aerobics is realized based on this. The experimental results show that this method has a high accuracy in identifying difficult actions in competitive aerobics video; the online arrangement of difficult actions in competitive aerobics has obvious advantages, meets the needs of users, and has strong practicability.