Human gesture recognition is an attractive research area in computer vision with many applications such as humanmachine interaction, virtual reality, etc. Recent deep learning techniques have been efficiently applied for gesture recognition, but they require a large and diverse amount of training data. In fact, the available gesture datasets contain mostly static gestures and/or certain fixed viewpoints. Some contain dynamic gestures, but they are not diverse in poses and viewpoints. In this paper, we propose a novel end-to-end framework for dynamic gesture recognition from unknown viewpoints. It has two main components: (1) an efficient GAN-based architecture, named ArVi-MoCoGAN; (2) the gesture recognition component, which contains C3D backbones and an attention unit. ArVi-MoCoGAN aims at generating videos at multiple fixed viewpoints from a real dynamic gesture at an arbitrary viewpoint. It also returns the probability that a real arbitrary view gesture belongs to which of the fixed-viewpoint gestures. These outputs of ArVi-MoCoGAN will be processed in the next component to improve the arbitrary view recognition performance through multi-view synthetic gestures. The proposed system is extensively analyzed and evaluated on four standard dynamic gesture datasets. The experimental results of our proposed method are better than the current solutions, from 1% to 13.58% for arbitrary view gesture recognition and from 1.2% to 7.8% for single view gesture recognition.