2020
DOI: 10.1007/978-3-030-58558-7_35
|View full text |Cite
|
Sign up to set email alerts
|

Large Scale Holistic Video Understanding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
83
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 86 publications
(83 citation statements)
references
References 52 publications
0
83
0
Order By: Relevance
“…MTL has been applied to other problems too. This includes various domains, such as language [24], [98], [99], audio [100], video [101], [102], and robotics [103], [104], as well as with different learning paradigms, such as reinforcement learning [105], [106], self-supervised learning [107], semisupervised learning [108], [109] and active learning [110], [111]. Surprisingly, in the deep learning era, very few works have considered MTL under the semi-supervised or active learning setting.…”
Section: Othermentioning
confidence: 99%
“…MTL has been applied to other problems too. This includes various domains, such as language [24], [98], [99], audio [100], video [101], [102], and robotics [103], [104], as well as with different learning paradigms, such as reinforcement learning [105], [106], self-supervised learning [107], semisupervised learning [108], [109] and active learning [110], [111]. Surprisingly, in the deep learning era, very few works have considered MTL under the semi-supervised or active learning setting.…”
Section: Othermentioning
confidence: 99%
“…Diba et al [ 97 ] introduced a new spatio-temporal, deep neural network architecture named “Holistic Presence and Temporal Network” (HATNet) that builds on the convergence of 2D and 3D architectures into one by integrating intermediate representations of presence and temporal signals.…”
Section: Computer Vision Applicationsmentioning
confidence: 99%
“…For example, the user might want to search the specified moment of eating a hamburger, a piece of cake, or pizza. Diba et al [7] construct a dataset for video recognition which contains the labels of scene, object, action, event, attribute, and concepts for each short video snippet. However, the information about the subject and the object of the action is not provided.…”
Section: Related Workmentioning
confidence: 99%