Frame prediction methods based on Auto-Encoder (AE) composed of convolutional neural networks (CNN) are very popular in detecting abnormal behaviour. The methods predict normal behaviour accurately and abnormal behaviour incorrectly, which is considered a criterion for abnormality discrimination. However, the emergence of problems such as too strong AE representation leading to detection failure, the insufficient ability of the network to extract spatio-temporal information, a large number of model parameters and slow running speed leads to the need for the method to be further improved. In this work, the authors propose a network framework for abnormal behaviour detection in video based on a pseudo-3D encoder and a multi-cascade memory mechanism (MMP3D). First of all, the encoder consisting of pseudo-3D convolution is used to extract spatio-temporal information from the video. Then, the multi-cascade memory mechanism (MM) and the multi-headed prototype attention mechanism are used to store and aggregate features of normal behaviour, which solves to some extent the problem of detection failure caused by strong AE representation power. Finally, the decoder designed by the 2D deconvolution layers is used to recover the prediction information. The efficiency and superiority of our method is validated on the Ped2 dataset, Avenue dataset, and ShanghaiTech dataset.