Spatial relation recognition, which aims to predict a spatial relation predicate, has attracted increasing attention in the computer vision study. During tackling this problem, modeling spatial relation of the subjects and objects is of great importance. We find that only using spatial features leads to poor results in predicting the spatial relation. To overcome these challenges, we propose an effective spatial attention module to enhance spatial features using semantic features. After identifying the importance of spatial attention mechanism, we propose a spatial transformer module with encoder layers to recognize unseen spatial relation based on spatial attention mechanism. Extensive experiments on the benchmark dataset (SpatialSense) show that, by using refined spatial feature, our spatial transformer model and spatial attention model achieve state-of-the-art performance on overall accuracy.