Molecular communication (MC) enables communication at the nanoscale where traditional electromagnetic waves are ineffective, and accurate signal detection is essential for practical implementation. However, due to the lack of accurate mathematical models, statistical-based signal detection methods are not applicable, and existing deep learning-based models exhibit relative simplicity in design. This paper integrates ideas from natural language processing into MC and proposes the MCFormer, a detector based on the classical Transformer model. Additionally, we propose an accelerated particle-based simulation algorithm using matrix operations for rapid generation of highquality training data with a lower complexity than traditional methods. The experimental results demonstrate that the MC-Former achieves nearly optimal accuracy in a noise-free environment, surpassing the performance of the Deep Neural Network (DNN). Moreover, MCFormer can show optimal performance in environments with significant levels of unknown noise. All the codes can be found at https://github.com/Xiwen-Lu/MCFormer.