In response to the current challenges in efficiently acquiring sound speed profiles and ensuring their representativeness, considering the need to fully leverage historical sound speed profiles while accounting for their spatiotemporal variability, we introduce a model for sound speed profile prediction based on a CNN-BiLSTM-Attention network, which integrates a convolutional neural network (CNN), a bidirectional long short-term memory network (BiLSTM), and an attention mechanism (AM). The synergy of these components enables the model to extract the spatiotemporal features of sound speed profiles more comprehensively. Utilizing the global ocean Argo grid dataset, the model predicted the sound speed profiles of an experimental zone in the Western Pacific Ocean. In predicting sound speed profiles of a single point, the model achieved a root mean square error (RMSE), relative error (RE), and accuracy (ACC) of 0.72 m/s, 0.029%, and 0.99971, respectively, surpassing comparative models. For regional sound speed profile prediction, the mean RMSE, RE, and ACC of different water layers were 0.919 m/s, −0.016%, and 0.9995, respectively. The experimental outcomes not only confirm the high accuracy of the model, but also highlight its superiority in sound speed profile prediction, particularly as an effective compensatory approach when profile measurements are untimely or contain representational errors.