Music is composed of a set of regular sound waves, which are usually ordered and have a large number of repetitive structures. Important notes, chords, and music fragments often appear repeatedly. Such repeated fragments (referred to as motifs) are usually the soul of a song. However, most music generated by existing music generation methods can not have distinct motifs like real music. This study proposes a novel multi-encoders model called Motif Transformer to generate music containing more motifs. The model is constructed using an encoder-decoder framework that includes an original encoder, a bidirectional long short term memory-attention encoder (abbreviated as bilstm-attention encoder), and a gated decoder. Where the original encoder is taken from the transformer's encoder and the bilstm-attention encoder is constructed from the bidirectional long short-term memory network (BILSTM) and the attention mechanism; Both the original encoder and the bilstm-attention encoder encode the motifs and input the encoded information representations to the gated decoder; The gated decoder decodes the entire input of the music and the information passed by the encoders and enhances the model's ability to capture motifs of the music in a gated manner to generate music with significantly repeated fragments. In addition, in order to better measure the model's ability of generating motifs, this study proposes an evaluation metric called used motifs. Experiments on multiple music field metrics show that the model proposed in this study can generate smoother and more beautiful music, and the generated music contains more motifs.