In order to study the interaction between pictures and words, we investigated the variations of their actual arousal levels through the observation of the positivity offset and negativity bias under different conditions. We used emotional pictures and emotional Chinese words to construct stimuli under four conditions: (1) only a word was presented (Word Only condition), (2) only a picture was presented (Picture Only condition), (3) a word was presented before a picture (Word Before Picture condition) and (4) a picture was presented before a word (Picture Before Word condition). The picture and word in each target pair under the conditions (3) and (4) were congruous in content and emotion. Significant negativity bias is noticed under the Picture Only and the Word Only conditions. Effects analogous to a positivity offset are observed under the other two conditions. It is suggested that the actual arousal levels of multiple stimuli, such as those presented under the conditions (3) and (4) are different from those of individual pictures and individual words, while the actual arousal levels of individual pictures do not differ significantly from those of individual words. The results indicate that content consistency can lead to a reduction in emotional attention, and also that the influence of pictures on words will differ from the opposite condition.