The emergence of the MPEG-4 standard has facilitated the development of new approaches to very-low bitrate coding, including model-based coding for human faces. As a tool in the SyntheticNatural Hybrid Coding part of the standard, a face object is defined in terms of its shape consisting of a set of facial feature points and texture. In addition, this face object can be animated by controlling these facial feature points to obtain a visual manifestation of speech and facial expression. However, the extraction of these facial feature points based on a particular human face is not a trivial task, and lies outside the scope of the MPEG-4 standard. In this paper, we introduce a system that automatically extracts 3-D facial feature points and calibrates a generic face model to a particular speaker's face. Our approach differs from previous work in that we extract facial feature points in 3-D using a stereoscopic camera leading to an automatic robust calibration of the face model to the particular face, yet still computationally simple. Preliminary experimental results show promising performance.