We propose a novel framework of using a nonparametric Bayesian model, called Dual Hierarchical Dirichlet Processes (Dual-HDP) (Wang et al. in IEEE Trans. Pattern Anal. Mach. Intell. 31:539-555, 2009), for unsupervised trajectory analysis and semantic region modeling in surveillance settings. In our approach, trajectories are treated as documents and observations of an object on a trajectory are treated as words in a document. Trajectories are clustered into different activities. Abnormal trajectories are detected as samples with low likelihoods. The semantic regions, which are subsets of paths commonly taken by objects and are related to activities in the scene, are also modeled. Under Dual-HDP, both the number of activity categories and the number of semantic regions are automatically learnt from data. In this paper, we further extend Dual-HDP to a Dynamic Dual-HDP model which allows dynamic update of activity models and online detection of normal/abnormal activities. Experiments Part of this work was published in Wang et al. (2008).are evaluated on a simulated data set and two real data sets, which include 8, 478 radar tracks collected from a maritime port and 40,453 visual tracks collected from a parking lot.
Searching for a target object in a cluttered scene constitutes a fundamental challenge in daily vision. Visual search must be selective enough to discriminate the target from distractors, invariant to changes in the appearance of the target, efficient to avoid exhaustive exploration of the image, and must generalize to locate novel target objects with zero-shot training. Previous work on visual search has focused on searching for perfect matches of a target after extensive category-specific training. Here, we show for the first time that humans can efficiently and invariantly search for natural objects in complex scenes. To gain insight into the mechanisms that guide visual search, we propose a biologically inspired computational model that can locate targets without exhaustive sampling and which can generalize to novel objects. The model provides an approximation to the mechanisms integrating bottom-up and top-down signals during search in natural scenes.
Socially-intelligent agents are of growing interest in artificial intelligence. To this end, we need systems that can understand social relationships in diverse social contexts. Inferring the social context in a given visual scene not only involves recognizing objects, but also demands a more indepth understanding of the relationships and attributes of the people involved. To achieve this, one computational approach for representing human relationships and attributes is to use an explicit knowledge graph, which allows for high-level reasoning. We introduce a novel end-to-endtrainable neural network that is capable of generating a Social Relationship Graph -a structured, unified representation of social relationships and attributes -from a given input image. Our Social Relationship Graph Generation Network (SRG-GN) is the first to use memory cells like Gated Recurrent Units (GRUs) to iteratively update the social relationship states in a graph using scene and attribute context. The neural network exploits the recurrent connections among the GRUs to implement message passing between nodes and edges in the graph, and results in significant improvement over previous methods for social relationship recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.