This paper presents a novel approach for automatic visual speech recognition using Convolutional VEF snake and canonical correlations. The utterance image sequences of isolated Chinese words are recorded with a headmounted camera, and we use Convolutional VEF snake model to detect and track lip boundary rapidly and accurately. Geometric and motion features are both extracted from lip contour sequences and concatenated to form a joint feature descriptor. Canonical correlation is applied to measure the similarity of two utterance feature matrices and a linear discriminant function is introduced to make further improvement on the recognition accuracy. Experimental results demonstrate that our approach is promising and the joint feature descriptor is more robust than individual ones.