Many of the current fault diagnosis methods rely on time-domain signals. While the richest information are contained in these signals, their complexity poses challenges to network learning and limits the ability to fully characterize them. To address these issues, a novel Multi-channel Fused Vision Transformer Network (MFVTN) is proposed in this paper. Firstly, the Overlapping Patch Embedding (OPE) module is introduced to overlap the time-domain map with edge information, preserving the global continuous features of the time-domain map and adding positional encoding for sorting. This integration helps the Vision Transformer (ViT) merge detailed features and construct the global mapping. Secondly, multiple dimensional time domain signal features are extracted and fused in parallel, enabling multi-domain fault diagnosis of bearings. In order to enhance the network ability to extract domain-invariant features, an adversarial training strategy combined with Wasserstein distance is utilized. The results demonstrate that the diagnostic accuracy of the proposed MFVTN can reach 98.2%.