Tianmin Shu scite author profile

This work is about recognizing human activities occurring in videos at distinct semantic levels, including individual actions, interactions, and group activities. The recognition is realized using a two-level hierarchy of Long Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture, which can be trained end-to-end. In comparison with existing architectures of LSTMs, we make two key contributions giving the name to our approach as Confidence-Energy Recurrent Network -CERN. First, instead of using the common softmax layer for prediction, we specify a novel energy layer (EL) for estimating the energy of our predictions. Second, rather than finding the common minimum-energy class assignment, which may be numerically unstable under uncertainty, we specify that the EL additionally computes the p-values of the solutions, and in this way estimates the most confident energy minimum. The evaluation on the Collective Activity and Volleyball datasets demonstrates: (i) advantages of our two contributions relative to the common softmax and energy-minimization formulations and (ii) a superior performance relative to the state-of-the-art approaches.

show abstract

Learning and Inferring “Dark Matter” and Predicting Human Intents and Trajectories in Videos

Xie¹,

Shu²,

Todorovic

et al. 2018

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

This paper presents a method for localizing functional objects and predicting human intents and trajectories in surveillance videos of public spaces, under no supervision in training. People in public spaces are expected to intentionally take shortest paths (subject to obstacles) toward certain objects (e.g., vending machine, picnic table, dumpster etc.) where they can satisfy certain needs (e.g., quench thirst). Since these objects are typically very small or heavily occluded, they cannot be inferred by their visual appearance but indirectly by their influence on people's trajectories. Therefore, we call them "dark matter", by analogy to cosmology, since their presence can only be observed as attractive or repulsive "fields" in the public space. A person in the scene is modeled as an intelligent agent engaged in one of the "fields" selected depending his/her intent. An agent's trajectory is derived from an Agent-based Lagrangian Mechanics. The agents can change their intents in the middle of motion and thus alter the trajectory. For evaluation, we compiled and annotated a new dataset. The results demonstrate our effectiveness in predicting human intent behaviors and trajectories, and localizing and discovering distinct types of "dark matter" in wide public spaces.

show abstract

Active Visual Information Gathering for Vision-Language Navigation

Wang

Yang

Shu

et al. 2020

View full text Add to dashboard Cite

Where and Why are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks

Wei

Liu

Shu

et al. 2018

View full text Add to dashboard Cite

Joint Mind Modeling for Explanation Generation in Complex Human-Robot Collaborative Tasks

Gao¹,

Gong²,

Zhao³

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tianmin Shu

CERN: Confidence-Energy Recurrent Network for Group Activity Recognition

Learning and Inferring “Dark Matter” and Predicting Human Intents and Trajectories in Videos

Active Visual Information Gathering for Vision-Language Navigation

Where and Why are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks

Joint Mind Modeling for Explanation Generation in Complex Human-Robot Collaborative Tasks

Contact Info

Product

Resources

About