Surveillance video summarisation is characterised by extracting video segments containing abnormal events from surveillance video footages. Accurate identification of abnormal events from surveillance footages is of paramount importance in surveillance video summarisation. Accordingly, the proposed framework builds an aggregated convolutional recurrent model that can precisely detect the suspicious events in a surveillance footage, by employing a supervised learning which is found to yield better results compared with unsupervised counterparts. The preliminary stage in the model is a multilayer Convolutional Neural Network for frame-level feature extraction followed by stacked bidirectional Gated Recurrent Unit for sequence-level feature extraction and classification. Since the video clips used for training are not implicit to surveillance, a block-based approach for testing on surveillance videos is proposed. The results evaluated on two custom datasets, Streets and Campus, prove that the proposed model produces remarkable results leveraging the properties of bidirectional GRU with supervised learning. Extensive experimental analysis on selection of optimum architecture is conducted which substantiates the significance of stacked bidirectional GRUs over unidirectional ones. Additionally, qualitative results ensure that summaries produced are concise, representative, complete, diverse and informative. Moreover, comparison of the performance of the proposed model with state of the art certainly proves the superiority of the proposed model.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.