Accurate and efficient forecasting of urban water supply is of great significance for urban water supply management. In this paper, a spatiotemporal deep learning model that integrates convolutional neural network (CNN), long short-term memory (LSTM), and attention mechanism (AM) is proposed for predicting the urban daily water supply. First, a one-dimensional CNN is used to identify the potential pattern structure in the water supply system and automatically extract the spatial features of the water supply data. Second, the feature vector output from the CNN is constructed into time series form and used as input to the LSTM network, and the parameters of the LSTM network are searched and optimized using the Bayesian algorithm. Then, the AM is introduced into the LSTM network, and the weighted sum is obtained by assigning the weights to the hidden layers of the LSTM network. Finally, the constructed CNN-LSTM-AM model captures the spatiotemporal information of the water supply data and makes an accurate prediction. Results show that the proposed CNN-LSTM-AM model reduces the mean absolute error, mean square error, and root mean square error values for two different sets of water supply data compared with the traditional LSTM, CNN-LSTM, and LSTM-AM models. The model has high forecasting accuracy and robustness, which are attributed to the excellent spatiotemporal feature extraction.