At present, the amount of network equipment, servers, and network traffic is increasing exponentially, and the way in which operators allocate and efficiently utilize network resources has attracted considerable attention from traffic forecasting researchers. However, with the advent of the 5G era, network traffic has also shown explosive growth, and network complexity has increased dramatically. Accurately predicting network traffic has become a pressing issue that must be addressed. In this paper, a multilayer perceptron ensemble learning method based on convolutional neural networks (CNN) and gated recurrent units (GRU) spatiotemporal feature extraction (MECG) is proposed for network traffic prediction. First, we extract spatial and temporal features of the data by convolutional neural networks (CNN) and recurrent neural networks (RNN). Then, the extracted temporal features and spatial features are fused into new spatiotemporal features through integrated learning of a multilayer perceptron, and a spatiotemporal prediction model is built in the sequence-to-sequence framework. At the same time, the teacher forcing mechanism and attention mechanism are added to improve the accuracy and convergence speed of the model. Finally, the proposed method is compared with other deep learning models for experiments. The experimental results show that the proposed method not only has apparent advantages in accuracy but also shows some superiority in time training cost.