Multi-stream Fusion Model for Social Relation Recognition from Videos

Lv, Jinna; Liu, Wu; Zhou, Lili; Wu, Bin; Ma, Huadóng

doi:10.1007/978-3-319-73603-7_29

Cited by 34 publications

(24 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the above three viewpoints, the superiority of our method can be verified. For performance evaluation, we use the F 1 mi, F 1 ma, Acc, and S ub acc [14]. Table 2 and 3 show the comparison results of our model with evaluation.…”

Section: Resultsmentioning

confidence: 99%

“…The dataset used in this paper is collected from movies and TV dramas, named SRIV [14]. The dataset is available at https://github.com/happyheart866/SRIV.…”

Section: Dataset and Methods In Comparisonmentioning

confidence: 99%

“…TSN: TSN [38] is a typical two-stream CNN network which has achieved the state-of-the-art performance on many video classification datasets. Multi-stream: Multiple features representing social relations were used to improve the recognition performance [14].…”

Section: Dataset and Methods In Comparisonmentioning

confidence: 99%

“…In recent years, some studies worked on well segmented sequence videos, or all of the available frames in video sequences [13]. Some methods simply computed summary statistics the relation traits over the whole video [14]. However, not every happening interaction during the video sequences will be relevant to recognize the social relations.…”

Section: Copyright C 2019 the Institute Of Electronics Information Amentioning

confidence: 99%

See 3 more Smart Citations

Attentive Sequences Recurrent Network for Social Relation Recognition from Video

Zhang

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

Jinna LV †, † †a) , Member, Bin WU †b) , Yunlei ZHANG †c) , Nonmembers, and Yunpeng XIAO † † †d) , Member SUMMARY Recently, social relation analysis receives an increasing amount of attention from text to image data. However, social relation analysis from video is an important problem, which is lacking in the current literature. There are still some challenges: 1) it is hard to learn a satisfactory mapping function from low-level pixels to high-level social relation space; 2) how to efficiently select the most relevant information from noisy and unsegmented video. In this paper, we present an Attentive Sequences Recurrent Network model, called ASRN, to deal with the above challenges. First, in order to explore multiple clues, we design a Multiple Feature Attention (MFA) mechanism to fuse multiple visual features (i.e. image, motion, body, and face). Through this manner, we can generate an appropriate mapping function from low-level video pixels to high-level social relation space. Second, we design a sequence recurrent network based on Global and Local Attention (GLA) mechanism. Specially, an attention mechanism is used in GLA to integrate global feature with local sequence feature to select more relevant sequences for the recognition task. Therefore, the GLA module can better deal with noisy and unsegmented video. At last, extensive experiments on the SRIV dataset demonstrate the performance of our ASRN model.

show abstract

Section: Resultsmentioning

confidence: 99%

“…The dataset used in this paper is collected from movies and TV dramas, named SRIV [14]. The dataset is available at https://github.com/happyheart866/SRIV.…”

Section: Dataset and Methods In Comparisonmentioning

confidence: 99%

Section: Dataset and Methods In Comparisonmentioning

confidence: 99%

Section: Copyright C 2019 the Institute Of Electronics Information Amentioning

confidence: 99%

See 2 more Smart Citations

Attentive Sequences Recurrent Network for Social Relation Recognition from Video

Zhang

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…birthday child in a birthday party). Lv et al (2018) propose to use multimodal data for social relation classification in TV shows and movies. Fan et al (2018) analyze shared attention in social scene videos.…”

Section: Social Relationshipmentioning

confidence: 99%

Visual Social Relationship Recognition

Wong

Zhao

et al. 2020

Int J Comput Vis

View full text Add to dashboard Cite

Social relationships form the basis of social structure of humans. Developing computational models to understand social relationships from visual data is essential for building intelligent machines that can better interact with humans in a social environment. In this work, we study the problem of visual social relationship recognition in images. We propose a Dual-Glance model for social relationship recognition, where the first glance fixates at the person of interest and the second glance deploys attention mechanism to exploit contextual cues. To enable this study, we curated a large scale People in Social Context (PISC) dataset, which comprises of 23,311 images and 79,244 person pairs with annotated social relationships. Since visually identifying social relationship bears certain degree of uncertainty, we further propose an Adaptive Focal Loss to leverage the ambiguous annotations for more effective learning. We conduct extensive experiments to quantitatively and qualitatively demonstrate the efficacy of our proposed method, which yields state-of-the-art performance on social relationship recognition.

show abstract

Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Multi-stream Fusion Model for Social Relation Recognition from Videos

Cited by 34 publications

References 16 publications

Attentive Sequences Recurrent Network for Social Relation Recognition from Video

Attentive Sequences Recurrent Network for Social Relation Recognition from Video

Visual Social Relationship Recognition

Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding

Contact Info

Product

Resources

About