Human action recognition is a hot topic and it has been applied to various fields. Deep learning is one of the techniques in human action recognition which has achieved good results. However, the task is still challenging due to the less collected samples. In order to address this challenge and improve the recognition accuracy, the stepwise generative recognizable network is proposed based on the generative adversarial network, which can be used to expand limited training samples and then recognize. Firstly, the stepwise generative recognizable network is designed to combine the function of images generation and recognition for human action. Secondly, the structural similar constraint is introduced to stepwise generative recognizable network, called structural similar stepwise generative recognizable network, which can compare the similarity of generated images with real data to improve quality and diversity of generated images. Finally, the performance of proposed networks is verified by common databases and the self-build database which is collected in daily life. We achieved 97.14%, 94.88% and 99.69% recognition accuracy on MNIST, Weizmann and self-build dataset, respectively. The experimental results show that the combination of generation and recognition can improve the recognition accuracy without abundant training data, and the structural similar constraint not only can improve the quality and diversity of generated images but also perform better in convergence. The structural similar stepwise generative recognizable network reduces the workload of manual collection and solves the problem of lower recognition accuracy for limited training samples, which achieves the characteristics of natural expanded samples.