In today's world, video surveillance systems play a vital role in commercial and industrial environments. The important goal of a surveillance activity is to observe suspicious behavior of humans and objects in a scene using camera or other sensors. Most of the current surveillance systems perform such activities by identifying persons,tracking their individual paths independently, not in conjunction with the objects in the scene. However, in a real world surveillance scenario,the behavior of people and their interaction with objects need to be modeled to reason about suspicious activities. Our contribution, through this work is in using the state-ofthe-art Structural Recurrent Neural Networks (SRNN) method to model the complex spatio-temporal human-object interactions in surveillance. Our best results have a final F 1 score of 87.3 on the human sub-activity recognition task and 82.7 on the object affordances recognition task. Our work considered weapons as objects of interest.