A Game-Theoretic Probabilistic Approach for Detecting Conversational Groups

Vascon, Sebastiano; Mequanint, Eyasu Zemene; Cristani, Marco; Hung, Hayley; Pelillo, Marcello; Murino, Vittorio

doi:10.1007/978-3-319-16814-2_43

Cited by 46 publications

(60 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Secondly, they also use a probabilistic motion analysis to extract interesting spatio-temporal patterns for scenario recognition. Vascon et al [23] detect conversational groups in crowded scenes of people. The approach uses pairwise affinities between people based on pose and a game-theoretic clustering procedure.…”

Section: Group Activity Recognitionmentioning

confidence: 99%

A Hierarchical Deep Temporal Model for Group Activity Recognition

Ibrahim

Muralidharan

Deng

et al. 2016

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

423

338

View full text Add to dashboard Cite

In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity. We build a deep model to capture these dynamics based on LSTM (long short-term memory) models. To make use of these observations, we present a 2-stage deep temporal model for the group activity recognition problem. In our model, a LSTM model is designed to represent action dynamics of individual people in a sequence and another LSTM model is designed to aggregate person-level information for whole activity understanding. We evaluate our model over two datasets: the Collective Activity Dataset and a new volleyball dataset. Experimental results demonstrate that our proposed model improves group activity recognition performance compared to baseline methods.

show abstract

Section: Group Activity Recognitionmentioning

confidence: 99%

A Hierarchical Deep Temporal Model for Group Activity Recognition

Ibrahim

Muralidharan

Deng

et al. 2016

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

423

338

View full text Add to dashboard Cite

show abstract

“…These constraint-based formations are shown in Figure 3. Formations are considered very useful in analyzing and increasing the quality of interaction in social interactions [1,10,11], and a number of works [19][20][21][22][23][24][25] have proposed different methods to detect F-formations automatically. The Hough voting strategy (density estimation) was used to locate the O-space (see Figure 2a) by considering each person's position and head orientation in [19].…”

Section: Background and Related Workmentioning

confidence: 99%

“…Frustum of attention to extract features from individuals and accordingly classify associates, singletons, and members of F-formations was used in [25]. Vascon et al [23] developed a game-theoretic model embedding the social-psychological concept of an F-formation and the biological constraints of social attention. They generated a frustum based on the position and orientation of each person and computed affinity to extract the F-formation.…”

Section: Background and Related Workmentioning

confidence: 99%

F-Formations for Social Interaction in Simulation Using Virtual Agents and Mobile Robotic Telepresence Systems

Pathi

Kristoffersson

Kiselev

et al. 2019

MTI

View full text Add to dashboard Cite

F-formations are a set of possible patterns in which groups of people tend to spatially organize themselves while engaging in social interactions. In this paper, we study the behavior of teleoperators of mobile robotic telepresence systems to determine whether they adhere to spatial formations when navigating to groups. This work uses a simulated environment in which teleoperators are requested to navigate to different groups of virtual agents. The simulated environment represents a conference lobby scenario where multiple groups of Virtual Agents with varying group sizes are placed in different spatial formations. The task requires teleoperators to navigate a robot to join each group using an egocentric-perspective camera. In a second phase, teleoperators are allowed to evaluate their own performance by reviewing how they navigated the robot from an exocentric perspective. The two important outcomes from this study are, firstly, teleoperators inherently respect F-formations even when operating a mobile robotic telepresence system. Secondly, teleoperators prefer additional support in order to correctly navigate the robot into a preferred position that adheres to F-formations.

show abstract

“…Also, errors observed for head pose are considerably smaller than for body pose over all four camera views- this is because body pose classifiers are impeded by severe occlusions in crowded scenes. Precisely for this reason, previous works on F-formation detection from FCGs [30], [32], [59] have primarily employed head orientation, even though body pose has been widely acknowledged as the more reliable cue for determining interacting persons. We believe that devising a multimodal approach also employing IR and bluetooth-based sensors for body pose estimation would be advantageous as compared to a purely visual analysis, which was one of the primary motives for compiling the SALSA dataset.…”

Section: Head and Body Pose Estimation From Visual Datamentioning

confidence: 99%

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

Alameda-Pineda

Staiano

Subramanian

et al. 2016

IEEE Trans. Pattern Anal. Mach. Intell.

139

View full text Add to dashboard Cite

Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.

show abstract

A Game-Theoretic Probabilistic Approach for Detecting Conversational Groups

Cited by 46 publications

References 51 publications

A Hierarchical Deep Temporal Model for Group Activity Recognition

A Hierarchical Deep Temporal Model for Group Activity Recognition

F-Formations for Social Interaction in Simulation Using Virtual Agents and Mobile Robotic Telepresence Systems

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

Contact Info

Product

Resources

About