“…Among the most relevant ones, Ye et al [57] focus on the language mapping between the action descriptions and avatar animation instead of the visual content and PIzzi et al [37] only generate sketchstyle static images. Besides, intelligent creation tools are in great demand as they can help users efficiently create customized dynamic content, e.g., video, animation [27,28]. Some researchers focus on several key steps, such as frame composition [60], shot selection [23,25], shot cut suggestion [35].…”