Significant advances have been made towards fault detection using deep learning. However, the fault labelling of seismic data requires great human effort. The resulting small sample problem makes traditional deep learning methods difficult to achieve desired results. Existing research proposes to train a deep learning model with labelled synthetic seismic data to get good fault detection results. However, due to the complexity of the actual geological situation, there are inevitable differences between synthetic seismic data and real seismic data in many aspects such as seismic signal frequency, frequency of fault distribution and degree of noise disturbance, which lead to the fact that the deep learning model trained by synthetic seismic data is difficult to get good fault detection result in field data applications. We propose to use transfer learning to reduce the impact of data differences to solve this problem: part of the deep transfer learning model is used to learn fault‐related features. And the other part of the deep transfer learning model is used to mine common features between the real seismic data and the synthetic seismic data, which makes the deep transfer learning model more suitable for real seismic data. Compared with the latest research progress, our method can greatly improve the effect of fault detection without real data label, which can significantly save the cost of manual label processing.