With rapid developments in biometric recognition, a great deal of attention is being paid to robots which interact smartly with humans and communicate certain types of biometrical information. Such human–machine interaction (HMI), also well-known as human–robot interaction (HRI), will, in the future, prove an important development when it comes to automotive manufacturing applications. Currently, hand gesture recognition-based HRI designs are being practically used in various areas of automotive manufacturing, assembly lines, supply chains, and collaborative inspection. However, very few studies are focused on material-handling robot interactions combined with hand gesture communication of the operator. The current work develops a depth sensor-based dynamic hand gesture recognition scheme for continuous-time operations with material-handling robots. The proposed approach properly employs the Kinect depth sensor to extract features of Hu moment invariants from depth data, through which feature-based template match hand gesture recognition is developed. In order to construct continuous-time robot operations using dynamic hand gestures with concatenations of a series of hand gesture actions, the wake-up reminder scheme using fingertip detection calculations is established to accurately denote the starting, ending, and switching timestamps of a series of gesture actions. To be able to perform typical template match on continuous-time dynamic hand gesture recognition with the ability of real-time recognition, representative frame estimates using centroid, middle, and middle-region voting approaches are also presented and combined with template match computations. Experimental results show that, in certain continuous-time periods, the proposed complete hand gesture recognition framework can provide a smooth operation for the material-handling robot when compared with robots controlled using only extractions of full frames; presented representative frames estimated by middle-region voting will maintain fast computations and still reach the competitive recognition accuracy of 90.8%. The method proposed in this study can facilitate the smart assembly line and human–robot collaborations in automotive manufacturing.