Three-stream fusion network for first-person interaction recognition

Kim, Ye Ji; Lee, Dong Gyu; Lee, Seong Whan

doi:10.1016/j.patcog.2020.107279

Cited by 8 publications

(6 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…TSCF, TSDF, and KRP represent Three-stream Correlation Fusion, Three-stream Deep Fusion, and Kernelized Ranked Pooling, respectively. The presented results of [12,16] are reported from [17]. It should be noted that all of the compared methods utilize raw RGB frames.…”

Section: Resultsmentioning

confidence: 99%

“…After extracting feature maps, maximum and average values are considered for the fusion step to obtain a unique feature map. In [17], the same architecture with a new correlation-based fusion approach is utilized. In these two articles, for the classification step, an LSTM network has been exploited.…”

Section: Raw Frame Featuresmentioning

confidence: 99%

“…As a result, our low complexity method utilizing available syntactic elements of the compressed domain can be employed for real-time applications. [15] 68.90% 59.50% LRCN(DOF) [15] 69.10% 89.00% LSTM(RGB_F) [27] 69.40% 70.00% LSTM(DOF) [27] 70.00% 91.00% TSDF(RGB_F) [16] 81.10% TSDF(DOF) [16] 86.40% KRP(RGB_F) [12] 73.80% KRP(DOF) [12] 85.70% TSCF(RGB_F) [17] 88.00% TSCF(DOF) [17] 94.40% Residuals(proposed) 69.33% 88.60%…”

Section: Nusfpid and Jpl Datasetsmentioning

confidence: 99%

See 2 more Smart Citations

Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain

2022

View full text Add to dashboard Cite

Section: Resultsmentioning

confidence: 99%

Section: Raw Frame Featuresmentioning

confidence: 99%

Section: Nusfpid and Jpl Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain

2022

View full text Add to dashboard Cite

“…After extracting feature maps, maximum and average values are considered for the fusion step to obtain a unique feature map. In [17], the same architecture with a new correlation-based fusion approach is utilized. In these two articles, for the classi cation step, a LSTM network has been exploited.…”

Section: Raw Frame Featuresmentioning

confidence: 99%

Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain

Abdari

Amirjan

Mansouri

2022

Preprint

View full text Add to dashboard Cite

With the widespread use of installed cameras, video-based monitoring approaches have seized considerable attention for different purposes like assisted living. Temporal redundancy and the sheer size of raw videos are the two most common problematic issues related to video processing algorithms. Most of the existing methods mainly focused on increasing accuracy by exploring consecutive frames, which is laborious and cannot be considered for real-time applications. Since videos are mostly stored and transmitted in compressed format, these kinds of videos are available on many devices. Compressed videos contain a multitude of beneficial information, such as motion vectors and quantized coefficients. Proper use of this available information can greatly improve the video understanding tasks' performance. This paper presents an approach for using residual data, available in compressed videos directly, which can be obtained by a light partially decoding procedure. In addition, a method for accumulating similar residuals is proposed, which dramatically reduces the number of processed frames for action recognition. Applying neural networks exclusively for accumulated residuals in the compressed domain accelerates performance, while the classification results are highly competitive with raw video approaches.

show abstract

“…Over the past few years, visual question answering (VQA) has attracted substantial attention from both the computer vision and natural language processing communities [1][2][3][4][5][6][7][8]. Compared to the traditional tasks of computer vision or natural language processing, such as object detection [9], image captioning [10][11][12][13][14], tracking [15,16], face recognition [17,18], action recognition [19][20][21],…”

Section: Introductionmentioning

confidence: 99%

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Kim

Lee

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories combined with their relationships or simple question embedding is insufficient for representing complex scenes and explaining decisions. To address this limitation, we propose the use of text expressions generated for images, because such expressions have few structural constraints and can provide richer descriptions of images. The generated expressions can be incorporated with visual features and question embedding to obtain the question-relevant answer. A joint-embedding multi-head attention network is also proposed to model three different information modalities with co-attention. We quantitatively and qualitatively evaluated the proposed method on the VQA v2 dataset and compared it with state-of-the-art methods in terms of answer prediction. The quality of the generated expressions was also evaluated on the RefCOCO, RefCOCO+, and RefCOCOg datasets. Experimental results demonstrate the effectiveness

show abstract

Three-stream fusion network for first-person interaction recognition

Cited by 8 publications

References 45 publications

Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain

Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain

Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Contact Info

Product

Resources

About