Proceedings of the 2019 on International Conference on Multimedia Retrieval 2019
DOI: 10.1145/3323873.3325056
|View full text |Cite
|
Sign up to set email alerts
|

Annotating Objects and Relations in User-Generated Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
87
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 125 publications
(87 citation statements)
references
References 29 publications
0
87
0
Order By: Relevance
“…A performance evaluation experiment of VSGG-Net is performed using two benchmark datasets, VidOR [10] (https://xdshang.github.io/docs/vidor.html) and VidVRD [3] (https://xdshang.github.io/docs/imagenet-vidvrd.html). The VidOR video dataset includes 80 object types and 50 relationship types.…”
Section: Experiments 41 Dataset and Model Trainingmentioning
confidence: 99%
“…A performance evaluation experiment of VSGG-Net is performed using two benchmark datasets, VidOR [10] (https://xdshang.github.io/docs/vidor.html) and VidVRD [3] (https://xdshang.github.io/docs/imagenet-vidvrd.html). The VidOR video dataset includes 80 object types and 50 relationship types.…”
Section: Experiments 41 Dataset and Model Trainingmentioning
confidence: 99%
“…Besides, they proposed an online association method with a siamese network and obtained the stateof-the-art results by combining these two parts. [18] contributed a large-scale VidOR dataset for VidVRD. On this dataset, [23] utilized language context feature along with spatial-temporal feature for predicate prediction and won the first place at VRU'19 (Video Relation Understanding 2019) grand challenge.…”
Section: Related Workmentioning
confidence: 99%
“…We evaluate our method on two datasets: the benchmark ImageNet-VidVRD dataset [19] and the newly released VidOR dataset [18]. ImageNet-VidVRD is the first dataset for VidVRD, which consists of 1,000 videos collected from ILSVRC2016-VID and is split into 800 training videos and 200 test videos.…”
Section: Experiments 41 Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…While human arXiv:2012.09402v1 [cs.CV] 17 Dec 2020 perception typically involves inferring the physical attributes about the humans (detection [5,35,43,50], poses [3,4,8,25,28,41], shape [13,20,29,30], gaze [44] etc. ), interpreting humans involves reasoning about the finer details relating to human activity [6,24,27,48,49], behaviour [26,34], human-object visual relationship detection [23,33,36,37,39,40], and human-object interactions [23,32,33,36,37,39,40,42]. In this work, we investigate the problem of identifying Human-Object Interactions in videos.…”
Section: Introductionmentioning
confidence: 99%