Recent research has suggested that dynamic emotion recognition involves strong audiovisual association; that is, facial or vocal information alone automatically induces perceptual processes in the other modality. We hypothesized that different emotions may differ in the automaticity of audiovisual association, resulting in differential audiovisual information processing. Participants judged the emotion of a talking-head video under audiovisual, video-only (with no sound), and audio-only (with a static neutral face) conditions. Among the six basic emotions, disgust had the largest audiovisual advantage over the unimodal conditions in recognition accuracy. In addition, in the recognition of all the emotions except for disgust, participants' eye-movement patterns did not change significantly across the three conditions, suggesting mandatory audiovisual information processing. In contrast, in disgust recognition, participants' eye movements in the audiovisual condition were less eyes-focused than the video-only condition and more eyes-focused than the audio-only condition, suggesting that audio information in the audiovisual condition interfered with eye-movement planning for important features (eyes) for disgust. In addition, those whose eye-movement pattern was affected less by concurrent disgusted voice information benefited more in recognition accuracy. Disgust recognition is learned later in life and thus may involve a reduced amount of audiovisual associative learning. Consequently, audiovisual association in disgust recognition is less automatic and demands more attentional resources than other emotions. Thus, audiovisual information processing in emotion recognition depends on the automaticity of audiovisual association of the emotion resulting from associative learning. This finding has important implications for real-life emotion recognition and multimodal learning.