The recognition of hand gestures in cluttered or complex environments is a vital research area in the human-computer interaction and computer vision fields due to its various potential applications, such as hand action analysis, driver hand behaviour monitoring, virtual reality, pose estimation, human action recognition, and sign language recognition. In order to create more reliable and efficient algorithms in this research field, various approaches have been suggested in recent years. However, a robust system is still elusive. For this reason, a new deep learning-based architecture for classifying hand gestures is suggested in this study; it is based on a hybrid model. The study makes two main contributions to the literature. The first is the creation of a new database for hand gesture recognition. The second is a novel hybrid architecture that combines two widely used pre-trained network models in an optimised manner, using a genetic algorithm for hyperparameter optimization. The proposed method comprises five main phases, namely, data acquisition, pre-processing, the design of the deep neural network architecture, hyperparameter optimization, the training of the proposed deep neural network architecture. The proposed method was tested on three comprehensive datasets. The experimental results reveal that the suggested method can effectively classify hand gestures with a high accuracy rate and that it outperforms the state-of-the-art methods in the literature.