Proceedings of the 27th ACM International Conference on Multimedia 2019
DOI: 10.1145/3343031.3350978
|View full text |Cite
|
Sign up to set email alerts
|

Long Short-Term Relation Networks for Video Action Detection

Abstract: It has been well recognized that modeling human-object or objectobject relations would be helpful for detection task. Nevertheless, the problem is not trivial especially when exploring the interactions between human actor, object and scene (collectively as humancontext) to boost video action detectors. The difficulty originates from the aspect that reliable relations in a video should depend on not only short-term human-context relation in the present clip but also the temporal dynamics distilled over a long-r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 26 publications
(11 citation statements)
references
References 48 publications
0
11
0
Order By: Relevance
“…For example, [4,23,44] adopts GCN to build a reasoning module to model the relations between disjoint and distant regions. [21,48] takes dense object proposals as graph nodes and learns the relations between them. [22] treats each object proposal detected in the sample frames as a graph node and then searches adaptive network structures to model the object interactions.…”
Section: Related Workmentioning
confidence: 99%
“…For example, [4,23,44] adopts GCN to build a reasoning module to model the relations between disjoint and distant regions. [21,48] takes dense object proposals as graph nodes and learns the relations between them. [22] treats each object proposal detected in the sample frames as a graph node and then searches adaptive network structures to model the object interactions.…”
Section: Related Workmentioning
confidence: 99%
“…Multi-modal action indeed has extended its popularity to many applications including recognition [47], generative multi-view action [40], detection [16], prediction [50], egocentric action [14], video identification [57], emotion with concept selection [48], personalized recommendation [44], human-object contour [53] and electromyography-vision [41]. The development of a low-cost depth sensor (Microsoft Kinect) opens up a new dimension of tackling the tasks of human action recognition.…”
Section: Related Work 21 Multi-modal Action Recognitionmentioning
confidence: 99%
“…Deep Neural Networks have shown superior and impressive performance on many tasks and even are the state-of-the-art methods in many real-world applications such as image classification [34], Poster Session D3: Multimedia Analysis and Description & Multimedia Fusion and Embedding MM '20, October 12-16, 2020, Seattle, WA, USA object detection [16,38], video action recognition [14,47,50,57], machine translation [3,25,35], and speech recognition [28]. However, these deep neural networks are notoriously well-known for their vulnerability [2,8].…”
Section: Introductionmentioning
confidence: 99%
“…detection [2,12,22] etc., it is still difficult for the machine to understand video content in a fine-grained and structured level. To tackle this issue, visual relation is one of the most important and useful information that can help describe the dynamic interactions between the objects in a video.…”
Section: Introductionmentioning
confidence: 99%