Traffic flow forecasting is an important function of intelligent transportation systems. With the rise of deep learning, building traffic flow prediction models based on deep neural networks has become a current research hotspot. Most of the current traffic flow prediction methods are designed from the perspective of model architectures, using only the traffic features of future moments as supervision signals to guide the models to learn the spatiotemporal dependence in traffic flow. However, traffic flow data themselves contain rich spatiotemporal features, and it is feasible to obtain additional self-supervised signals from the data to assist the model to further explore the underlying spatiotemporal dependence. Therefore, we propose a self-supervised traffic flow prediction method based on a spatiotemporal masking strategy. A framework consisting of symmetric backbone models with asymmetric task heads were applied to learn both prediction and spatiotemporal context features. Specifically, a spatiotemporal context mask reconstruction task was designed to force the model to reconstruct the masked features via spatiotemporal context information, so as to assist the model to better understand the spatiotemporal contextual associations in the data. In order to avoid the model simply making inferences based on the local smoothness in the data without truly learning the spatiotemporal dependence, we performed a temporal shift operation on the features to be reconstructed. The experimental results showed that the model based on the spatiotemporal context masking strategy achieved an average prediction performance improvement of 1.56% and a maximum of 7.72% for longer prediction horizons of more than 30 min compared with the backbone models.