Since the Viola and Jones' method on real-time face detection was proposed in 2001, numerous works for object detection, person recognition, and object tracking have been published by papers and journals. Each method has its strong points and drawbacks. That means that in a system which only employs a standalone method, we could only get either speed or accuracy. In this paper, we proposed a state-machine method to combine face recognition, face detection, and tracker to harness the tracker promptness while maintaining the ability to distinguish the person of interest with the other person and backgrounds, to overcome the limitations of the standalone method. Subsequently, the information gathered from this image processing side will be delivered to the hardware tracker. The image processing side becomes a visual sensor that provides feedback or measurement value i.e. center point coordinate value from the detected face. The 2 DOF hardware tracker camera platform being used implements Model Predictive Control to calculate required control action thus the platform is able to track the target object, keeping it at the center of the frame. MPC method is chosen because it produces an optimal control signal while considering the input signal saturation aspect. The MPC control signals deliver a good control pan and tilt system response with rise time < 1 second and overshoot <15%. It is also noticed that the FSM implemented in this paper is able to meet the goal with a considerable performance for indoor settings.