Hand gesture recognition plays an important role in human-robot interaction. The accuracy and reliability of hand gesture recognition are the keys to gesture-based human-robot interaction tasks. To solve this problem, a method based on multimodal data fusion and multiscale parallel convolutional neural network (CNN) is proposed in this paper to improve the accuracy and reliability of hand gesture recognition. First of all, data fusion is conducted on the sEMG signal, the RGB image, and the depth image of hand gestures. Then, the fused images are generated to two different scale images by downsampling, which are respectively input into two subnetworks of the parallel CNN to obtain two hand gesture recognition results. After that, hand gesture recognition results of the parallel CNN are combined to obtain the final hand gesture recognition result. Finally, experiments are carried out on a self-made database containing 10 common hand gestures, which verify the effectiveness and superiority of the proposed method for hand gesture recognition. In addition, the proposed method is applied to a seven-degree-of-freedom bionic manipulator to achieve robotic manipulation with hand gestures.