Due to the complexity of underwater environments, underwater target recognition based on radiated noise has always been challenging. This paper proposes a multi-scale frequency-adaptive network for underwater target recognition. Based on the different distribution densities of Mel filters in the low-frequency band, a three-channel improved Mel energy spectrum feature is designed first. Second, by combining a frequency-adaptive module, an attention mechanism, and a multi-scale fusion module, a multi-scale frequency-adaptive network is proposed to enhance the model’s learning ability. Then, the model training is optimized by introducing a time–frequency mask, a data augmentation strategy involving data confounding, and a focal loss function. Finally, systematic experiments were conducted based on the ShipsEar dataset. The results showed that the recognition accuracy for five categories reached 98.4%, and the accuracy for nine categories in fine-grained recognition was 88.6%. Compared with existing methods, the proposed multi-scale frequency-adaptive network for underwater target recognition has achieved significant performance improvement.