This paper presents an audio-video surveillance system for the automatic surveillance in public transport vehicle. The system comprises six modules including in particular three novel ones: (i) Face Detection and Tracking, (ii) Audio Event Detection and (iii) Audio-Video Scenario Recognition. The Face Detection and Tracking module is responsible for detecting and tracking faces of people in front of cameras. The Audio Event Detection module detects abnormal audio events which are precursor for detecting scenarios which have been predefined by end-users. The Audio-Video Scenario Recognition module performs high level interpretation of the observed objects by combining audio and video events based on spatio-temporal reasoning. The performance of the system is evaluated for a series of pre-defined audio, video and audio-video events specified using an audio-video event ontology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.