We present a robot eye-hand coordination learning method that can directly learn visual task specification by watching human demonstrations. Task specification is represented as a task function, which is learned using inverse reinforcement learning(IRL[1]) by inferring a reward model from state transitions. The learned reward model is then used as continuous feedbacks in an uncalibrated visual servoing(UVS[2]) controller designed for the execution phase. Our proposed method can directly learn from raw videos, which removes the need for hand-engineered task specification. Benefiting from the use of a traditional UVS controller, the training on real robot only happens at initial Jacobian estimation which takes an average of 4-7 seconds for a new task. Besides, the learned policy is independent from a particular robot, thus has the potential of fast adapting to other robot platforms. Various experiments were designed to show that, for a task with certain DOFs, our method can adapt to task/environment changes in target positions, backgrounds, illuminations, and occlusions.