The mechanical equipment often faces complex working environments in practical operating conditions, and the external environmental interference generated by operating conditions, environmental factors and other components causes the vibration signals to exhibit characteristics with frequency distortion and multi-modality. The existing fault diagnosis methods rarely consider the issue of external environmental interference. Aiming at the background of fault diagnosis under external environment interference, a fault diagnosis method based on Markov transfer field (MTF) with enhanced properties and multi-scale convolutional neural network with attention mechanism (AM-MSCNN) is proposed. The fault features embedded in vibration signals under external environmental interference can be extracted, and an important contribution to the fault diagnosis method under external environmental interference can be made. Firstly, an interference mode selection model based on symplectic geometry modal decomposition is constructed to address the issues of distortion and multi-modality caused by external environmental interference. Next, a two-dimensional feature extraction method based on the MTF with enhanced properties is established. The challenge of extracting temporal correlation features from one-dimensional vibration signals affected by external environmental interference is addressed by Markov transition probability. The impact of external environmental interference can be mitigated, and that has strong anti-interference capability and robustness. Finally, an attention mechanism that can adaptively assign weights is designed, and the AM-MSCNN model is designed to effectively extract global features by incorporating attention mechanisms in the parallel layers of MSCNN and the attention mechanism helps to suppress external environmental interference and improve the diagnostic results. An experimental platform for simulating the typical faults under external environmental interference is constructed, and the experimental results demonstrate that the proposed method exhibits superior generalization performance under varying degrees of different interference environments. The overall average accuracy reaches 92.2%, and the highest accuracy reaches 94.0% for external interference working conditions.