Fault diagnosis plays a critical role in ensuring the safe operation of machinery. Multi-source domain adaptation (DA) leverages rich fault knowledge from source domains to enhance diagnostic performance on unlabeled target domains. However, most existing methods only align marginal distributions, neglecting inter-class relationships, which results in decreased performance under variable working conditions and small samples. To overcome these limitations, two stage multi-source domain adaptation (TSMDA) has been proposed for bearing fault diagnosis. Specifically, wavelet packet decomposition is applied to analyze fault information from signals. For small sample datasets, Diffusion is used to augment the dataset and serve as the source domain. Next, multi-scale features are extracted, and mutual information is computed to prevent the negative transfer. DA is divided into two stages. Firstly, multikernel maximum mean discrepancy is used to align the marginal distributions of the multi-source and target domains. Secondly, the target domain is split into subdomains based on the calculated pseudo-labels. Conditional distributions are aligned by minimizing the distance from samples to the center of the non-corresponding domain. The effectiveness of the proposed method is verified by extensive experiments on two public datasets and one experimental dataset. The results demonstrate that TSMDA has high and stable diagnostic performance and provides an effective method for practical fault diagnosis.