In forensic applications, it is very helpful to capture the face images of the tracked people. However, regular CCTV cameras only capture a small number of pixels on the face region. A promising solution to this problem is to use a network of PTZ (Pan-Tilt-Zoom) cameras since the zoom capacities of PTZ cameras offer the option of a close view on demand. In this paper, we address the problem of persistent people tracking and automatic face capture using a single PTZ camera. The detected faces are associated with the corresponding people and trajectories. The time-critical and dynamic nature of our problem complicates our task. Different from previous work which use a mixture of wide angle cameras and PTZ cameras, we explore the limits to what can be expected from a single PTZ camera. The system first detects and tracks pedestrians in zoomed-out mode, then selects, using a scheduler, a person to zoom in. After zoom in, we come back to wide area mode, and solve the person-to-person, face-to-person and face-to-face data association problems. Extensive experiments in challenging outdoor uncontrolled conditions demonstrate the effectiveness of the proposed system.