Gaze tracking determines what a user is looking at; the key challenge is to obtain well-focused eye images. This is not easy because the human eye is very small, whereas the required resolution of the image should be large enough for accurate detection of the pupil center. In addition, capturing a user's eye image by a remote gaze tracking system within a large working volume at a long Z distance requires a panning/tilting mechanism with a zoom lens, which makes it more difficult to acquire focused eye images. To solve this problem, a new auto-focusing method for remote gaze tracking is proposed. The proposed approach is novel in the following four ways: First, it is the first research on an auto-focusing method for a remote gaze tracking system. Second by using userdependent calibration at initial stage, the weakness of the previous methods that use facial width in captured image to estimate Z distance between a user and camera, wherein each person has the individual variation of facial width, is solved. Third, the parameters of the modeled formula for estimating the Z distance are adaptively updated using the least squares regression method. Therefore, the focus becomes more accurate over time. Fourth, the relationship between the parameters and the face width is fitted locally according to the Z distance instead of by global fitting, which can enhance the accuracy of Z distance estimation. The results of an experiment with 10,000 images of 10 persons showed that the mean absolute error between the ground-truth Z distance measured by a Polhemus Patriot device and that estimated by the proposed method was 4.84 cm. A total of 95.61% of the images obtained by the proposed method were focused and could be used for gaze detection.