Underwater Acoustic Target Recognition (UATR) remains one of the most challenging tasks in underwater signal processing due to the lack of labeled data acquisition, the impact of the time-space varying intrinsic characteristics, and the interference from other noise sources. Although some deep learning methods have been proven to achieve state-of-the-art accuracy, the accuracy of the recognition task can be improved by designing a Residual Network and optimizing feature extraction. To give a more comprehensive representation of the underwater acoustic signal, we first propose the three-dimensional fusion features along with the data augment strategy of SpecAugment. Afterward, an 18-layer Residual Network (ResNet18), which contains the center loss function with the embedding layer, is designed to train the aggregated features with an adaptable learning rate. The recognition experiments are conducted on the ship-radiated noise dataset from a real environment, and the accuracy results of 94.3% indicate that the proposed method is appropriate for underwater acoustic recognition problems and sufficiently surpasses other classification methods.