In intra coding, template matching prediction is an effective method to reduce the non-local redundancy inside image content. However, the prediction indicated by the best template matching is not always the actually best prediction. To solve this problem, we propose a method, which merges multiple template matching predictions through a convolutional neural network with attention module. The convolutional neural network aims at exploring different combinations of the candidate template matching predictions, and the attention module focuses on determining the most significant prediction candidate. Besides, the spatial module in attention mechanism can be utilized to model the relationship between the original pixels in current block and the reconstructed pixels in adjacent regions (template). Compared to the directional intra prediction and traditional template matching prediction, our method can provide a unified framework to generate prediction with high accuracy. The experimental results show that, compared the averaging strategy, the BD-rate reductions can reach up to 4.7%, 5.5% and 18.3% on the classic standard sequences (classB-classF), SIQAD dataset (screen content), and Urban100 dataset (natural scenes) respectively, while the average bit rate saving are 0.5%, 2.7% and 1.8%, respectively.