Surveillance systems that capture video and audio in enterprise facilities and public places produce massive amounts of data while operating at a 24/7 mode. There is an increasing need to process, on the fly, such huge video and audio data streams to enable a quick summary of "interesting" events that are happening during a specified time frame in a particular location. Concepts like fog computing based on localisation of data processing will relax the need of existing cloud-based solutions from extensive bandwidth and processing needs at remote cloud resources, however, the abilities of data processing on the extreme edge are limited by the hardware capabilities of the devices. In this paper, we describe a novel, adaptive architecture and that builds on top of a distributed computing paradigm and is ideal for smart surveillance systems that can utilize resources at cloud, fog and edge. We provide the main architectural components, the hardware options and key software components of the system. The proposed architecture uses cloud, edge and fog computing concepts. Edge computing is realized by a camera embedded system, cloud computing with the usage of public accessible infrastructure for data processing and fog computing for the processing and data fusion of video streams in small areas.