Prediction of pedestrian crossing behavior is an important issue faced by the realization of autonomous driving. The current research on pedestrian crossing behavior prediction is mainly based on vehicle camera. However, the sight line of vehicle camera may be blocked by other vehicles or the road environment, making it difficult to obtain key information in the scene. Pedestrian crossing behavior prediction based on surveillance video can be used in key road sections or accident-prone areas to provide supplementary information for vehicle decision-making, thereby reducing the risk of accidents. To this end, we propose a pedestrian crossing behavior prediction network for surveillance video. The network integrates pedestrian posture, local context and global context features through a new cross-stacked gated recurrence unit (GRU) structure to achieve accurate prediction of pedestrian crossing behavior. Applied onto the surveillance video dataset from the University of California, Berkeley to predict the pedestrian crossing behavior, our model achieves the best results regarding accuracy, F1 parameter, etc. In addition, we conducted experiments to study the effects of time to prediction and pedestrian speed on the prediction accuracy. This paper proves the feasibility of pedestrian crossing behavior prediction based on surveillance video. It provides a reference for the application of edge computing in the safety guarantee of automatic driving.