This paper proposes a model-based method for real-time automatic mood estimation in video sequences. The approach is customized by learning the person’s specific facial parameters, which are transformed into facial Action Units (AUs). A model mapping for mood representation is used to describe moods in terms of the PAD space: Pleasure, Arousal, and Dominance. From the intersection of these dimensions, eight octants represent fundamental mood categories. In the experimental evaluation, a stimulus video randomly selected from a set prepared to elicit different moods was played to participants, while the participant’s facial expressions were recorded. From the experiment, Dominance is the dimension least impacted by facial expression, and this dimension could be eliminated from mood categorization. Then, four categories corresponding to the quadrants of the Pleasure–Arousal (PA) plane, “Exalted”, “Calm”, “Anxious” and “Bored”, were defined, with two more categories for the “Positive” and “Negative” signs of the Pleasure (P) dimension. Results showed a 73% of coincidence in the PA categorization and a 94% in the P dimension, demonstrating that facial expressions can be used to estimate moods, within these defined categories, and provide cues for assessing users’ subjective states in real-world applications.