To accurately diagnose the quayside container crane (QCC) gearbox faults, this article proposes a method that combines the frequency-domain Markov transformation field (FDMTF) and multi-branch residual convolutional neural network (MBRCNN). Firstly, the gearbox vibration signal is converted into the frequency domain to reveal the components and amplitude of signals stably and concisely. Then, the one-dimensional frequency signal is encoded into the two-dimensional image by the Markov transformation field to capture the dynamic characteristics of signals. Thirdly, the MBRCNN network is constructed, which can extract multi-scale features and alleviate the problems caused by the deep network structure. Finally, the FDMTF image is fed into the constructed MBRCNN model for pattern recognition. The effectiveness of the proposed FDMTF–MBRCNN method is verified by two case studies. In Case 1, the diagnosis results of a benchmark dataset achieve 100% accuracy, better than seven state-of-the-art methods published in recent 3 years. In Case 2, the diagnosis results of the dataset collected from a 1:4 scaled test rig achieve 98.85% accuracy, better than eleven encoding methods and four convolutional neural network methods. It also can obtain a recognition accuracy of more than 94% under the conditions of small sample, different network hyper-parameters, or variable loads, which verifies its robustness. These case studies show that the FDMTF–MBRCNN method is expected to be applied to the actual fault diagnosis of QCC gearboxes.