Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

Monfort, Mathew; Jin, SouYoung; Liu, Alexander; Harwath, David; Feris, Rogério; Glass, James; Oliva, Aude

doi:10.48550/arxiv.2105.04489

2021

DOI: 10.48550/arxiv.2105.04489

|View full text |Cite

Preprint

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

Mathew Monfort¹,

SouYoung Jin²,

Alexander Liu³

et al.

Abstract: When people observe events, they are able to abstract key information and build concise summaries of what is happening. These summaries include contextual and semantic information describing the important high-level details (what, where, who and how) of the observed event and exclude background information that is deemed unimportant to the observer. With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2022

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 57 publications

(118 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

Human Activity Recognition for Assisted Living Based on Scene Understanding

et al. 2022

View full text Add to dashboard Cite

The growing share of the population over the age of 65 is putting pressure on the social health insurance system, especially on institutions that provide long-term care services for the elderly or to people who suffer from chronic diseases or mental disabilities. This pressure can be reduced through the assisted living of the patients, based on an intelligent system for monitoring vital signs and home automation. In this regard, since 2008, the European Commission has financed the development of medical products and services through the ambient assisted living (AAL) program—Ageing Well in the Digital World. The SmartCare Project, which integrates the proposed Computer Vision solution, follows the European strategy on AAL. This paper presents an indoor human activity recognition (HAR) system based on scene understanding. The system consists of a ZED 2 stereo camera and a NVIDIA Jetson AGX processing unit. The recognition of human activity is carried out in two stages: all humans and objects in the frame are detected using a neural network, then the results are fed to a second network for the detection of interactions between humans and objects. The activity score is determined based on the human–object interaction (HOI) detections.

show abstract

Human Activity Recognition for Assisted Living Based on Scene Understanding

et al. 2022

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

Cited by 1 publication

References 57 publications

Human Activity Recognition for Assisted Living Based on Scene Understanding

Human Activity Recognition for Assisted Living Based on Scene Understanding

Contact Info

Product

Resources

About