We are looking to perform anomaly detection in video streams, within the fastest time possible, and without the need to handengineer features to suit for particular scenes. In any scene captured by surveillance camera, there could be single or multiple persons (agents) and activities ongoing concurrently, with or without human-object and/or human-human interactions. These characteristics lead to a very interesting problem, which involves techniques and insights from a number of domains−anomaly detection, activity recognition, sequence modeling, and deep learning. First, we need to know how to represent video frames as a set of features, then model the temporal sequence and the spatio-temporal relations in the sequence, followed by training the system using some machine learning algorithm on the training set of sequences. The trained system would be able to tell when there is an anomaly in the input stream. However, this is very challenging due to large variations in environment and human movement, and also due to the vague definition of anomaly in the domain of video surveillance. In this paper, we would like to give informational insights on how techniques from the four domains above can be applied to perform video-based anomaly detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.