We address the problem of motion recovery for a head-eye system from stereo image sequences. Two types of motions, the translation of the vehicle and the panning motion of the head, are considered. We show how these motions and the depth map of the scene can be estimated directly from the measurements of image gradients and time derivatives in a sequence of stereo images. There is no need to estimate image motion, track a scene feature over time, or establish point correspondences in a stereo image pair. We present the results of various experiments with real scenes.