The tracking and recognition of facial activities from images or videos attracted great attention in computer vision field. Facial activities are characterized by three levels: First, in the bottom level, facial feature points around each facial component, i.e., eyebrow, mouth, etc, capture the detailed face shape information; Second, in the middle level, facial action units (AUs), defined in Facial Action Coding System, represent the contraction of a specific set of facial muscles, i.e., lid tightener, eyebrow raiser, etc; Finally, in the top level, six prototypical facial expressions represent the global facial muscle movement and are commonly used to describe the human emotion state. In contrast to the mainstream approaches, which usually only focus on one or two levels of facial activities, and track (or recognize) them separately, this paper introduces a unified probabilistic framework based on the Dynamic Bayesian network (DBN) to simultaneously and coherently represent the facial evolvement in different levels, their interactions and their observations. Advanced machine learning methods are introduced to learn the model based on both training data and subjective prior knowledge. Given the model and the measurements of facial motions, all three levels of facial activities are simultaneously recognized through a probabilistic inference. Extensive experiments are performed to illustrate the feasibility and effectiveness of the proposed model on all three level facial activities.