Recently, various deep learning models, which are mainly based on data-driven algorithms, have received more and more attention in the field of intelligent fault diagnosis and prognostics. However, there are two major assumptions accepted by default in the existing studies: 1) The training (source domain) and testing (target domain) data sets obey the same feature distribution; 2) Sufficient labeled data with fault information is available for model training. In real industrial scenarios, especially for different machines, these assumptions are mostly invalid, which makes it a huge challenge to build reliable diagnostic model. Motivated by transfer learning, we present a novel intelligent method named deep transfer network (DTN) with multi-kernel dynamic distribution adaptation (MDDA) to address the problem of cross-machine fault diagnosis. In the proposed approach, the DTN has wide first-layer convolutional kernel and several small convolutional layers, which is utilized to extract transferable features across different machines and suppress high frequency noise. Then, the MDDA method constructs a weighted mixed kernel function to map different transferable features to a unified feature space, and the relative importance of the marginal and conditional distributions are also evaluated dynamically. The proposed method is verified by three transfer learning tasks of bearings, in which the health states of wind turbine bearings in real scenario are identified by using diagnosis knowledge from two different bearings in laboratories. The results show that the proposed method can achieve higher diagnosis accuracy and better transfer performance even under different noisy environment conditions than many other state-of-the-art methods. The presented framework offers a promising approach for cross-machine fault diagnosis. INDEX TERMS Deep transfer network, multi-kernel dynamic distribution adaptation, cross-machine fault diagnosis, transfer learning, bearings.