“…The long-term vision is to benchmark with respect to the capabilities of human-cinematographers, real-time video editors, surveillance personnel to record and semantically annotate individual and group activity (e.g., for summarisation, story-book format digital media and promo generation). ] interpretation, robotic plan generation, semantic model generation from video, ambient intelligence and smart environments (e.g., see narrative based models in [Hajishirzi et al, 2012, Hajishirzi and Mueller, 2011, Mueller, 2007, Bhatt and Flanagan, 2010, Dubba et al, 2011, Eppe and Bhatt, 2013).…”