Answer generation is one of the most important tasks in natural language processing, and deep learning-based methods have shown their strength over traditional machine learning based methods. However, most previous deep learning-based answer generation models were built on traditional recurrent neural networks or convolutional neural networks. The former model cannot well exploit contextual correlation preserved in paragraphs due to their inherent computation complexity. For the latter, since the size of the convolutional kernel is fixed, the model cannot extract complete semantic information features. In order to alleviate this problem, based on multi-layer Transformer aggregation coder, we propose an end-toend answer generation model (AG-MTA). AG-MTA consists of a multi-layer attention Transformer unit and a multi-layer attention Transformer aggregation encoder (MTA). It can focus on information representation at different positions and aggregate nodes at same layer to combine the context information. Thereby, it fuses semantic information from base layer to top layer, enhancing the information representation of the encoder. Furthermore, based on trigonometric function, a novel position encoding method is also proposed. Experiments are conducted on public datasets SQuAD. AG-MTA reaches the state-of-the-art performance, EM score achieves 71.1 and F1 score achieves 80.3. INDEX TERMS Question answering system, natural language processing, self-attention mechanism, transformer coding structure.