We propose a robust-to new dataset and situation-approach to detect violence in CCTV feeds that breaks with the traditional assumption of having large amounts of training data that are representative samples. Detecting violence in CCTV feeds is an objectively hard problem that is of paramount importance to solve for effective situational understanding. Violence comprises a large spectrum of activities that can go from abuse, to fighting, to road accidents, that can therefore take place in completely different environments, from public buildings, to underground stations, to roads during the day or the night. This is therefore one of those tasks at which humans excel, while machines still lag behind. We show that there are specific, detectable, and measurable features of video feeds that correlate with-among other things-violence and, by fusing such features with semantic knowledge, we can in principle provide estimates of sequences of videos that correlate with violence.Index Terms-uncertain sources, complex event processing
I. INTRODUCTIONDetecting violence in CCTV feeds is an objectively hard problem that is of paramount importance to solve for effective situational understanding. Situational understanding requires both insight and foresight. In its traditional definition [1] it is the "product of applying analysis and judgement to the unit's situation awareness to determine the relationships of the factors present, and form logical conclusions concerning threats to the mission accomplishment, opportunities for mission accomplishment, and gaps in information." The UK Ministry of Defence Doctrine [2] goes further, explicitly mentioning that (situational) "understanding involves acquiring and developing knowledge to a level that enables us to know why something has happened or is happening (insight) and be able to identify and anticipate what may happen (foresight)."Violence, in particular, comprises a large spectrum of activities that can go from abuse, to fighting, to road accidents, that can therefore take place in completely different environments, from public buildings, to underground stations, to roads during the day or the night. This is therefore one of those tasks at which humans excel, while machines still lag behind.It is therefore surprising that state-of-the-art approaches [3] assume-as is traditional-the existence of a large set