Multivariate time series (MTS) prediction aims at predicting future time series by extracting multiple forms of dependencies of past time series. Traditional prediction methods and deep learning‐based prediction methods focus on extracting the dynamic relationships of certain aspects of MTS, especially the temporal characteristics, often neglecting the spatial and temporal dynamic correlations of MTS. Inspired by convolution neural network (CNN) and attention mechanism, this paper proposes a convolution LSTM network model based on MTS prediction with two‐stage attention. Specifically, we first propose a new MTS preprocessing method to perform convolution operations better. Then convolution layer extracts spatial correlation of MTS and LSTM model extracts temporal correlation. It is worth mentioning that the combination of attention mechanism and LSTM can effectively solve the problem of insufficient time dependency in MTS prediction. In addition, dual‐stage attention mechanism can effectively eliminate irrelevant information, select the relevant exogenous sequence, give it higher weight, and increase the past value of the target sequence to further eliminate irrelevant information. Finally, the MTS spatio‐temporal correlation is extracted to improve the prediction accuracy, and the model is interpreted. Experimental results show that the model has broad application prospects. Experiments based on typical datasets of finance, environment, and energy determine the optimal window size and hidden size of the prediction, and demonstrate that the model achieves the state‐of‐the‐art effect compared to the other four deep learning models. On top of that, the model is not only suitable for single‐step prediction of MTS, but also suitable for multistep prediction of time step in a certain range.