This paper aims to address the increasingly severe security threats in financial systems by proposing a novel financial attack detection model, Finsformer. This model integrates the advanced Transformer architecture with the innovative cluster-attention mechanism, dedicated to enhancing the accuracy of financial attack behavior detection to counter complex and varied attack strategies. A key innovation of the Finsformer model lies in its effective capture of key information and patterns within financial transaction data. Comparative experiments with traditional deep learning models such as RNN, LSTM, Transformer, and BERT have demonstrated that Finsformer excels in key metrics such as precision, recall, and accuracy, achieving scores of 0.97, 0.94, and 0.95, respectively. Moreover, ablation studies on different feature extractors further confirm the effectiveness of the Transformer feature extractor in processing complex financial data. Additionally, it was found that the model’s performance heavily depends on the quality and scale of data and may face challenges in computational resources and efficiency in practical applications. Future research will focus on optimizing the Finsformer model, including enhancing computational efficiency, expanding application scenarios, and exploring its application on larger and more diversified datasets.