Despite the fact that there is critical grammatical information expressed through facial expressions and head gestures, most research in the field of sign language recognition has primarily focused on the manual component of signing. We propose a novel framework for robust tracking and analysis of non-manual behaviours, with an application to sign language recognition. The novelty of our method is threefold. First, we propose a dynamic feature representation. Instead of using only the features available in the current frame (e.g., head pose), we additionally aggregate and encode the feature values in neighbouring frames to better encode the dynamics of expressions and gestures (e.g., head shakes). Second, we use Multiple Instance Learning [12] to handle feature misalignment resulting from drifting of the face tracker and partial occlusions. Third, we utilize a discriminative Hidden Markov Support Vector Machine (HMSVM) [1] to learn finer temporal dependencies between the features of interest. We apply our signerindependent framework to segmented recognition of five classes of grammatical constructions conveyed through facial expressions and head gestures: wh-questions, negation, conditional/when clauses, yes/no questions and topics, and show improvement over previous methods.