Domain adaptation (DA) based intelligent fault diagnosis (IFD) methods have attracted great attention in recent years. The key motivation of DA methods is to extract the domain-invariant features. In most cases, the one-dimensional convolutional neural networks (CNN) are adopted as the feature extractor, in which the kernels are usually single and fixed. However, the monitoring data for IFD usually involves various scale information, the feature representation extracted by above models may be incomprehensive. Moreover, the target domain data is only used for narrowing distribution discrepancy in an unsupervised way, which may lead to the ignorance of class information of target domain. To address these issues, in this paper, a two-stage multi-scale domain adversarial fault diagnosis method is proposed. A multi-scale feature extractor with different kernel sizes is designed to acquire more discriminative domain-invariant features. Meanwhile, the pseudo label learning is adopted for providing transfer learning process with the pseudo labels of target domain, which are generated by a pre-train network in the first stage and then are optimised through iterations in the second stage. The maximum mean discrepancy (MMD) is also adopted to enhance the ability of model for marginal distribution alignment, which can make the model more robust. Thirty-eight transfer tasks from two different datasets were conducted to evaluate the effectiveness of the proposed method. The experimental results demonstrated that the proposed method achieved higher average diagnosis accuracy compared with several popular methods. The superiority of our proposed method was further explained by visualization of learned features.