Due to the low bandwidth of underwater acoustic communication, large-sized color images cannot be transmitted in a timely manner. In response to this problem, this paper uses the deep learning-based VQ-VAE-2 model to propose two other fine-tuning model structures, NVQ and VQI, to encode, compress, transmit, reconstruct and generate large-information original images through convolutional networks. The NVQ model reduces the characteristics of the bottom quantization, and the VQI model re-quantizes the bottom quantization result, which significantly reduces the amount of data transmitted. The picture compression rate of the VQ-VAE-2 model is 13:1, and the picture compression rates of NVQ and VQI are 32:1 and 60:1. SSIM (Structural Similarity) of VQ-VAE-2, NVQ and VQI are 0.94, 0.91 and 0.87, respectively. In contrast to previous work, 1) we obtain visually pleasing reconstructions that are perceptually similar to the input, and 2) our approaches have short encoding time and decoding time, generate character strings directly without adding additional overhead, so have shorter transmission time compared with other methods.