2019
DOI: 10.1007/978-3-030-25590-9_6
|View full text |Cite
|
Sign up to set email alerts
|

Object Detection-Based Location and Activity Classification from Egocentric Videos: A Systematic Analysis

Abstract: Egocentric vision has emerged in the daily practice of application domains such as lifelogging, activity monitoring, robot navigation and the analysis of social interactions. Plenty of research focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that indoor locations and daily activities can be characterized by the presence of specific objects. Objects can be obtained either from laborious human annotations or auto… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 44 publications
(83 reference statements)
0
5
0
Order By: Relevance
“…In [61], [62] optical flow was employed to detect salient regions, which were cropped from the original RGB frames and were given to the network as a second, more focused RGB stream. Other input modalities have been employed including depth [7], [41], egocentric cues comprising hand [63], [64], [65] and object regions [64], [66], [67], head motions [63] and gaze-based saliency maps [63], [65], sensor-based modalities [15], [56], [59] and sound [43], [68], [69]. In [38], [40] object and hand localization and segmentation were intermediate learning steps that forced the network to focus on important egocentric cues prior to action prediction.…”
Section: Video Activity Recognitionmentioning
confidence: 99%
“…In [61], [62] optical flow was employed to detect salient regions, which were cropped from the original RGB frames and were given to the network as a second, more focused RGB stream. Other input modalities have been employed including depth [7], [41], egocentric cues comprising hand [63], [64], [65] and object regions [64], [66], [67], head motions [63] and gaze-based saliency maps [63], [65], sensor-based modalities [15], [56], [59] and sound [43], [68], [69]. In [38], [40] object and hand localization and segmentation were intermediate learning steps that forced the network to focus on important egocentric cues prior to action prediction.…”
Section: Video Activity Recognitionmentioning
confidence: 99%
“…• The Binary Presence Vector (BPV) of objects [68,70] from Chapter 3.2.3 consisting of zeros and ones with length equal to the number of noun classes of EPIC-Kitchens (352). The BPVs are concatenated to the hand coordinates for every frame and the feature size increases to 356 (352 + 4).…”
Section: Methodsmentioning
confidence: 99%
“…Furthermore, we reflect on the effect of object detection quality on the location and activity recognition outputs. Parts of this chapter are published in [68,70].…”
Section: Thesis Outlinementioning
confidence: 99%
See 2 more Smart Citations