2015
DOI: 10.1016/j.jvcir.2015.07.016
|View full text |Cite
|
Sign up to set email alerts
|

Two-person interaction recognition via spatial multiple instance embedding

Abstract: a b s t r a c tIn this work, we look into the problem of recognizing two-person interactions in videos. Our method integrates multiple visual features in a weakly supervised manner by utilizing an embedding-based multiple instance learning framework. In our proposed method, first, several visual features that capture the shape and motion of the interacting people are extracted from each detected person region in a video. Then, twoperson visual descriptors are formed. Since the relative spatial locations of int… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(20 citation statements)
references
References 51 publications
0
20
0
Order By: Relevance
“…Sefidgar et al [113] use the same reasoning to create a model based on discriminative key frames and consider their relative distance and timing within the interaction. Sener and Ikizler-Cinbis [116] formulate interaction detection as a multiple-instance learning problem to select relevant frames, because not all frames in an interaction are considered informative.…”
Section: Template Based Approachesmentioning
confidence: 99%
“…Sefidgar et al [113] use the same reasoning to create a model based on discriminative key frames and consider their relative distance and timing within the interaction. Sener and Ikizler-Cinbis [116] formulate interaction detection as a multiple-instance learning problem to select relevant frames, because not all frames in an interaction are considered informative.…”
Section: Template Based Approachesmentioning
confidence: 99%
“…Yang et al [20] improve classification in these cases by building detectors for various types of physical interactions such as hand-hand and hand-shoulder touches. The relative distance between individuals has been further explored by Sener and İkizler [21], who formulate interaction detection as a multiple-instance learning problem because not all frames in an interaction are considered informative. Sefidgar et al [22] use the same reasoning to create a model based on discriminative key frames and consider their relative distance and timing within the interaction.…”
Section: Related Workmentioning
confidence: 99%
“…The torso parts of both actors are connected through a virtual root part of the graph. This part does not have an associated part detector but it allows us to model relative distances between people, similar to Patron-Perez et al [5] and Sener and İkizler [21]. We enforce that the size of the virtual root part is equal to the size of the entire dyad of bodies, regardless of the locations and sizes of the associated part detectors.…”
Section: Multiple Features Our Model Supports Different Types Of Featmentioning
confidence: 99%
See 1 more Smart Citation
“…Assorted approaches for action recognition in HAR have been proposed by Marín-Jiménez MJ, Yeguas (2013) [16], where Spatio-Temporal Interest points (STIP) are identified to enable the action recognition better.Spatio-Temporal Interest points were extracted from video sequences through Harris3D actions in videos where different types of low-and middle-level features have been used to predict action. Sener F, Ikizler-Cinbis, (2015) [17] elaborated the interaction between two people in action recognition. Tracing interacting persons, acquiring visual descriptors together with the distance between the actors, in the multiple learning process help to recognize the action.…”
Section: Related Workmentioning
confidence: 99%