E<sup>2</sup>(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Plizzari, Chiara; Goletto, Gabriele; Cannici, Marco; Emanuele, Gusso,; Matteucci, Matteo; Caputo, Barbara

doi:10.1109/cvpr52688.2022.01931

Cited by 33 publications

(11 citation statements)

References 74 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, the transferability of Kinetics to other popular action recognition datasets has been studied directly [18] or indirectly to benchmark architectures [47,13]. Kinetics pretraining has also been applied to other action recognition settings, such as egocentric actions [33], action recognition from drones [9] or actions in the dark [52]. Other studies have used Kinetics to initialize models for more distant video tasks, including sign language recognition [25] or autonomous vehicle decision-making [41].…”

Section: Transfer Learning From Imagenet and Kineticsmentioning

confidence: 99%

Trailers12k: Improving Transfer Learning with a Dual Image and Video Transformer for Multi-Label Movie Trailer Genre Classification

Montalvo-Lezama¹,

Montalvo-Lezama²,

Fuentes-Pineda³

2022

SSRN Journal

View full text Add to dashboard Cite

Section: Transfer Learning From Imagenet and Kineticsmentioning

confidence: 99%

Trailers12k: Improving Transfer Learning with a Dual Image and Video Transformer for Multi-Label Movie Trailer Genre Classification

Montalvo-Lezama¹,

Montalvo-Lezama²,

Fuentes-Pineda³

2022

SSRN Journal

View full text Add to dashboard Cite

“…Another important aspect to consider when posing real-time constrains is that, in the context of egocentric vision, many techniques attain notable results only by leveraging non-real-time secondary modalities such as the optical flow. Although this modality is highly successful, it has a high computational cost [36], [37], which prevents its use in real-time applications, and increases the size of the model.…”

Section: Bringing Fpar In the Wildmentioning

confidence: 99%

Bringing Online Egocentric Action Recognition into the wild

Goletto¹,

Caputo²,

Averta³

2022

Preprint

View full text Add to dashboard Cite

To enable a safe and effective humanrobot cooperation, it is crucial to develop models for the identification of human activities. Egocentric vision seems to be a viable solution to solve this problem, and therefore many works provide deep learning solutions to infer human actions from first person videos. However, although very promising, most of these do not consider the major challenges that comes with a realistic deployment, such as the portability of the model, the need for real-time inference, and the robustness with respect to the novel domains (i.e., new spaces, users, tasks). With this paper, we set the boundaries that egocentric vision models should consider for realistic applications, defining a novel setting of egocentric action recognition in the wild, which encourages researchers to develop novel, applicationsaware solutions. We also present a new model-agnostic technique that enables the rapid repurposing of existing architectures in this new context, demonstrating the feasibility to deploy a model on a tiny device (Jetson Nano) and to perform the task directly on the edge with very low energy consumption (2.4W on average at 50 fps).

show abstract

“…Despite the abundance of conventional frame-like datasets, there is a noticeable scarcity of event-based action recognition datasets. As for the simulated datasets, N-EPIC-Kitchens [75] is an event version of the EPIC-Kitchens generated by the event camera simulator. The event UCF-50 [76] is derived from the UCF-50 action recognition dataset, which was captured by displaying its data on a monitor.…”

Section: Datasets For Action Recognitionmentioning

confidence: 99%

Hypergraph Neural Network for Skeleton-Based Action Recognition

Hao

Guo

et al. 2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset THU MV-EACT -50, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.

show abstract

E²(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Cited by 33 publications

References 74 publications

Trailers12k: Improving Transfer Learning with a Dual Image and Video Transformer for Multi-Label Movie Trailer Genre Classification

Trailers12k: Improving Transfer Learning with a Dual Image and Video Transformer for Multi-Label Movie Trailer Genre Classification

Bringing Online Egocentric Action Recognition into the wild

Hypergraph Neural Network for Skeleton-Based Action Recognition

Contact Info

Product

Resources

About

E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Cited by 33 publications

References 74 publications

Trailers12k: Improving Transfer Learning with a Dual Image and Video Transformer for Multi-Label Movie Trailer Genre Classification

Trailers12k: Improving Transfer Learning with a Dual Image and Video Transformer for Multi-Label Movie Trailer Genre Classification

Bringing Online Egocentric Action Recognition into the wild

Hypergraph Neural Network for Skeleton-Based Action Recognition

Contact Info

Product

Resources

About

E²(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition