Current self-attention based Transformer models in the field of fault diagnosis are limited to identifying correlation information within a single sequence and are unable to capture both time and frequency domain fault characteristics of the original signal. To address these limitations, this research introduces a two-channel Transformer fault diagnosis model that integrates time and frequency domain features through a cross-attention mechanism. Initially, the original time-domain fault signal is converted to the frequency domain using the Fast Fourier Transform, followed by global and local feature extraction via a Convolutional Neural Network. Next, through the self-attention mechanism on the two-channel Transformer, separate fault features associated with long distances within each sequence are modeled and then fed into the feature fusion module of the cross-attention mechanism. During the fusion process, frequency domain features serve as the query sequence Q, and time domain features as the key-value pairs K. By calculating the attention weights between Q and K, the model excavates deeper fault features of the original signal. Besides preserving the intrinsic associative information within sequences learned via the self-attention mechanism, the Twins Transformer also models the degree of association between different sequence features using the cross-attention mechanism. Finally, the proposed model's performance was validated using four different experiments on four bearing datasets, achieving average accuracy rates of 99.67%, 98.76%, 98.47%, and 99.41%. These results confirm the model's effective extraction of time and frequency domain correlation features, demonstrating fast convergence, superior performance, and high accuracy.