It has been found that the existing methods for generating multi-track music fail to meet the market requirements in terms of melody, rhythm and harmony, and most of the generated music does not conform to the basic music theory knowledge. This paper proposes a multi-track music synthesis model that uses the improved WGAN-GP and is guided by music theory rules to generate music works with high musicality to solve the problems mentioned above. Through the improvement of the adversarial loss function and the introduction of the self-attention mechanism, the improved WGANGP is obtained, which is applied to multi-track music synthesis, and both subjective and objective aspects evaluate the performance of the model. The score of multi-track music synthesized by this paper’s model is 8.22, higher than that of real human works, which is 8.04, and the average scores of the four indexes of rhythm, melody, emotion, and harmony are 8.15, 8.27, 7.61, and 8.22, respectively, which are higher than that of the three models of MuseGAN, MTMG, and HRNN, except for the emotion index. The data processing accuracy and error rate of this paper’s model, as well as the training loss value and track matching, are 94.47%, 0.15%, 0.91, and 0.84, respectively, which are better than WGANGP and MuseGAN. The gap between synthesized multi-track music and the music theory rules of real music using the model in this paper is very small, which can fully meet practical needs. The deep learning model constructed in this paper provides a new path for the generation of multi-track music.