Previous emotion studies in education have focused mainly on the superiority of positive emotion for learning performance (e.g., enjoyment) over negative emotion (e.g., fear). However, few studies have considered different arousal levels in terms of learners' emotion. For example, the effects of calm positive or negative emotion have not been discussed, when compared to arousing positive or negative emotion. Based on the limited Capacity model of motivated mediated message processing (LC4MP), this study investigated how learners' emotional valence and arousal, induced by video clips, influenced their learning performance and mental effort in an animated instruction with different modalities (written-text versus spoken-text). A total of 206 participants were randomly assigned to eight groups: (a) calm positive, (b) calm negative (c) arousing positive, and (d) arousing negative emotions under different modality conditions (written and spoken). The results showed that both arousing groups outperformed calm groups on a recall test only in the written-text group regardless of valence, while emotional valence and arousal did not significantly influence learning performance in the spoken-text group. The results provide partial support for the LC4MP model and imply that the arousing emotional state has the potential to enhance multimedia learning.