It is important in computer animation to synthesize expressive facial animation for avatars from videos. Some traditional methods track a set of semantic feature points on the face to drive the avatar. However, these methods usually suffer from inaccurate detection and sparseness of the feature points and fail to obtain high-level understanding of facial expressions, leading to less expressive and even wrong expressions on the avatar. In this paper, we propose a state-aware synthesis framework. Instead of simply fitting 3D face to the 2D feature points, we use expression states obtained by a set of lowcost classifiers (based on local binary pattern and support vector machine) on the face texture to guide the face fitting procedure. Our experimental results show that the proposed hybrid framework enjoys the advantages of the original methods based on feature point and the awareness of the expression states of the classifiers and thus vivifies and enriches the face expressions of the avatar.