Nowadays, deep learning (DL)-based video surveillance services are widely used in smart cities because of their ability to accurately identify and track objects, such as vehicles and pedestrians, in real time. This allows a more efficient traffic management and improved public safety. However, DL-based video surveillance services that require object movement and motion tracking (e.g., for detecting abnormal object behaviors) can consume a substantial amount of computing and memory capacity, such as (i) GPU computing resources for model inference and (ii) GPU memory resources for model loading. This paper presents a novel cognitive video surveillance management with long short-term memory (LSTM) model, denoted as the CogVSM framework. We consider DL-based video surveillance services in a hierarchical edge computing system. The proposed CogVSM forecasts object appearance patterns and smooths out the forecast results needed for an adaptive model release. Here, we aim to reduce standby GPU memory by model release while avoiding unnecessary model reloads for a sudden object appearance. CogVSM hinges on an LSTM-based deep learning architecture explicitly designed for future object appearance pattern prediction by training previous time-series patterns to achieve these objectives. By referring to the result of the LSTM-based prediction, the proposed framework controls the threshold time value in a dynamic manner by using an exponential weighted moving average (EWMA) technique. Comparative evaluations on both simulated and real-world measurement data on the commercial edge devices prove that the LSTM-based model in the CogVSM can achieve a high predictive accuracy, i.e., a root-mean-square error metric of 0.795. In addition, the suggested framework utilizes up to 32.1% less GPU memory than the baseline and 8.9% less than previous work.
This paper presents a novel adaptive object movement and motion tracking (AdaMM) framework in a hierarchical edge computing system for achieving GPU memory footprint reduction of deep learning (DL)-based video surveillance services. DL-based object movement and motion tracking requires a significant amount of resources, such as (1) GPU processing power for the inference phase and (2) GPU memory for model loading. Despite the absence of an object in the video, if the DL model is loaded, the GPU memory must be kept allocated for the loaded model. Moreover, in several cases, video surveillance tries to capture events that rarely occur (e.g., abnormal object behaviors); therefore, such standby GPU memory might be easily wasted. To alleviate this problem, the proposed AdaMM framework categorizes the tasks used for the object movement and motion tracking procedure in an increasing order of the required processing and memory resources as task (1) frame difference calculation, task (2) object detection, and task (3) object motion and movement tracking. The proposed framework aims to adaptively release the unnecessary standby object motion and movement tracking model to save GPU memory by utilizing light tasks, such as frame difference calculation and object detection in a hierarchical manner. Consequently, object movement and motion tracking are adaptively triggered if the object is detected within the specified threshold time; otherwise, the GPU memory for the model of task (3) can be released. Moreover, object detection is also adaptively performed if the frame difference over time is greater than the specified threshold. We implemented the proposed AdaMM framework using commercial edge devices by considering a three-tier system, such as the 1st edge node for both tasks (1) and (2), the 2nd edge node for task (3), and the cloud for sending a push alarm. A measurement-based experiment reveals that the proposed framework achieves a maximum GPU memory reduction of 76.8% compared to the baseline system, while requiring a 2680 ms delay for loading the model for object movement and motion tracking.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.