Credit card fraud is an important issue and incurs a considerable cost for both cardholders and issuing institutions. Contemporary methods apply machine learning-based approaches to detect fraudulent behavior from transaction records. But manually generating features needs domain knowledge and may lay behind the modus operandi of fraud, which means we need to automatically focus on the most relevant patterns in fraudulent behavior. Therefore, in this work, we propose a spatial-temporal attention-based neural network (STAN) for fraud detection. In particular, transaction records are modeled by attention and 3D convolution mechanisms by integrating the corresponding information, including spatial and temporal behaviors. Attentional weights are jointly learned in an end-to-end manner with 3D convolution and detection networks. Afterward, we conduct extensive experiments on real-word fraud transaction dataset, the result shows that STAN performs better than other state-of-the-art baselines in both AUC and precision-recall curves. Moreover, we conduct empirical studies with domain experts on the proposed method for fraud post-analysis; the result demonstrates the effectiveness of our proposed method in both detecting suspicious transactions and mining fraud patterns.