With the rapid development of artificial intelligence technologies, data-driven methods have significantly contributed to the intelligent monitoring and diagnosis of mechanical systems. However, the state-of-the-art approaches, especially the deep learning-based ones, implicitly assume the availability of large amounts of labeled fault data for supervised training, which is often infeasible due to the highly reliable system design in the field. In this research, a deep transfer convolutional neural network (CNN) scheme is proposed to enhance the diagnosis performance when dealing with insufficient training data in the target domain. By utilizing transfer learning, rich but relevant feature representation can be learnt from massive data in the source domain. The learnt weights and biases in the source domain are transferred to the target task as the initial parameter values. Then, the transferred parameters are properly fine-tuned with the small labeled datasets in the target domain. To avoid overfitting in the case of scarcely labeled samples in the target domain, global average pooling (GAP) is introduced to replace the fully-connected layers, and the traditional architecture in CNN is modified, to reduce the number of trainable parameters. Finally, by fully considering the transfer scenarios between diverse operating conditions and diverse machines, the cross-machine transfer experiments are designed with three gearbox datasets provided by the Prognostic and Health Management (PHM) 2009 conference, the Tsinghua University, and the University of Alberta. The results demonstrate the effectiveness of the proposed method with scarce labeled samples in the target domain. K E Y W O R D S convolutional neural network, deep transfer learning, gearbox fault diagnosis, global average pooling, scarcely labeled samples 1 | INTRODUCTION Machinery condition monitoring and fault diagnosis is a classic problem in the manufacturing industry. 1-4 Generally, the machinery fault diagnosis methods can be categorized into three classes, including model-based methods, knowledge-based methods, and data-driven methods. The model-based methods, such as finite element simulation and system identification, can perform fault diagnosis by monitoring the consistency between the real-time variables of the practical machine and the model-predicted values. However, due to the complex structure and working mechanism,