In order to solve the problems of unbalanced sample data and the lack of consideration of temporal information in existing Siamese-based trackers, this paper proposes a Siamese recurrent neural network and region proposal network (Siamese R-RPN), which can be trained in an end-to-end manner. Siamese R-RPN is consisted of Siamese network, recurrent neural network and region proposal network. Image features extracted by the Siamese network are strengthened by the channel and spatial attention mechanisms, and are sent to the RPN for classification and regression. Temporal information is processed by a recurrent neural network-based Long Short-Term Memory (LSTM) to predict the rough location of the target, it is mapped to the anchor feature map of the RPN for anchor selection. This makes the positive and negative samples participating in the training procedure to become more balanced and representative. Because of the collaborative use of temporal and spatial information, the tracker proposed in this paper has achieved state-of-the-art performance on three large tracking benchmarks-OTB 2015, VOT2016 and VOT 2018-where this verifies its effectiveness.