Mathew Monfort scite author profile

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time ("opening" is "closing" in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.

show abstract

Multi-Agent Tensor Fusion for Contextual Trajectory Prediction

Zhao

Monfort

et al. 2019

415

265

View full text Add to dashboard Cite

Accurate prediction of others' trajectories is essential for autonomous driving. Trajectory prediction is challenging because it requires reasoning about agents' past movements, social interactions among varying numbers and kinds of agents, constraints from the scene context, and the stochasticity of human behavior. Our approach models these interactions and constraints jointly within a novel Multi-Agent Tensor Fusion (MATF) network. Specifically, the model encodes multiple agents' past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context. The model decodes recurrently to multiple agents' future trajectories, using adversarial loss to learn stochastic predictions. Experiments on both highway driving and pedestrian crowd datasets show that the model achieves state-ofthe-art prediction accuracy.

show abstract

Moments in Time Dataset: one million videos for event understanding

Monfort¹,

Andonian²,

Zhou³

et al. 2018

Preprint

View full text Add to dashboard Cite

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

Monfort¹,

Pan²,

Ramakrishnan

et al. 2022

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Reasoning About Human-Object Interactions Through Dual Attention Networks

Xiao

Fan²,

Gutfreund

et al. 2019

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mathew Monfort

Moments in Time Dataset: One Million Videos for Event Understanding

Multi-Agent Tensor Fusion for Contextual Trajectory Prediction

Moments in Time Dataset: one million videos for event understanding

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

Reasoning About Human-Object Interactions Through Dual Attention Networks

Contact Info

Product

Resources

About