This paper presents a framework for real-time high-fidelity upper limb motion transfer from a human to a humanoid robot, intending to provide robust support for robots to learn manipulation skills. Robot imitation of human motion is modeled as a simultaneous wrist and elbow tracking problem, where the coordination between multiple imitation objectives and physical constraints receives special attention. First, for the vision-based motion capture system, an efficient preprocessing method is introduced to map the raw human skeletal points analogously or reflectively to the robot’s workspace. Then, the generation of velocity-level imitation motion commands is formalized as a constrained hierarchical quadratic programing (HQP) problem, in which multiple objectives for achieving imitation are organized into a hierarchy. The wrist tracking task is set as the highest priority, followed by the adjustment of the elbow and the minimization of the joint velocity. Physical limits of the robot joints, as well as strict kinematic constraints regarding self-motion, are considered as constraints for the optimization problem. Finally, a hierarchical recurrent neural network (RNN) is designed to solve this HQP problem through its unique neural dynamics. The performance of this network is theoretically guaranteed with high computational efficiency. Quantitative results show that the proposed motion imitation control scheme offers excellent integrated tracking accuracy of [Formula: see text] m and is robust to different arm length ratios. Using this method, the humanoid robot can imitate the upper body motions of a human as accurately as possible under physical constraints, without significant delays.