2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.503
|View full text |Cite
|
Sign up to set email alerts
|

Identifying First-Person Camera Wearers in Third-Person Videos

Abstract: We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first-and thirdperson videos, which is challenging because the camera wearer is not visible from his/her own egocentric video, preventing the use of direct feature matching. In this p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
48
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 47 publications
(48 citation statements)
references
References 27 publications
0
48
0
Order By: Relevance
“…Baselines We first implement multiple baselines to compare the performance considering inputs, and models. These baseline method are proposed in peer researches [14,27,23] including spatial-domain siamese network [14], motion-domain siamese network [14], twostream semi-siamese network [14], triplet network [27], and temporal domain image and flow network [14,23]. We also demonstrate the weight share performance for siamesenetwork.…”
Section: Results and Comparisonmentioning
confidence: 91%
See 2 more Smart Citations
“…Baselines We first implement multiple baselines to compare the performance considering inputs, and models. These baseline method are proposed in peer researches [14,27,23] including spatial-domain siamese network [14], motion-domain siamese network [14], twostream semi-siamese network [14], triplet network [27], and temporal domain image and flow network [14,23]. We also demonstrate the weight share performance for siamesenetwork.…”
Section: Results and Comparisonmentioning
confidence: 91%
“…The authors proposed a 'Graph' representation for temporal and spatial matching. In [14], the authors solved the task to localize the person in the third view if given the both the third and ego camera frames. In this paper, spatial-domain semisiamese, motion-domain semi-siamese, dual-domain semisiamese, and dual-domain semi-triplet networks are well studied.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Our hypothesis is that simultaneous segmentation and matching is mutually beneficial: segmentation helps refine matching by producing finer-grained appearance features (compared to bounding boxes), which are important in crowded scenes with many occlusions, while matching helps locate a person of interest and produce better segmentation masks, which in turn help in tasks like activity and action recognition. We show that previous work [14] is a special case of ours, since we can naturally handle their first-and third-person cases. We evaluate on two publicly available datasets augmented with pixel-level annotations, showing that we achieve significantly better results than numerous baselines.…”
Section: Introductionmentioning
confidence: 84%
“…That paper's approach is applicable in closed settings with overhead cameras (e.g., a museum), but not in unconstrained environments such as our law enforcement example. Fan et al [14] relax many assumptions, allowing arbitrary third-person camera views and including evidence based on scene appearance. Zheng et al [43] consider the distinct problem of identifying the same person appearing in multiple wearable camera videos (but not trying to identify the camera wearers themselves).…”
Section: Introductionmentioning
confidence: 99%