G-TAD: Sub-Graph Localization for Temporal Action Detection

Xu, Mantao; Zhao, Chen; Rojas, David S.; Thabet, Ali; Ghanem, Bernard

doi:10.48550/arxiv.1911.11462

Cited by 2 publications

(1 citation statement)

References 51 publications

(70 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, graph convolutional networks (GNN) have drawn increasing attention due to their successful applications in various tasks [30]- [37], [40]- [42]. At the beginning, Scarselli et al [43] introduced the graph neural network for graph-focused and node-focused applications by extending recursive neural networks and random walk models.…”

Section: B Graph Convolutional Networkmentioning

confidence: 99%

Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Zhang,

Han,

Song

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper focuses on tackling the problem of temporal language localization in videos, which aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video. However, it is nontrivial since it requires not only the comprehensive understanding of the video and sentence query, but also the accurate semantic correspondence capture between them. Existing efforts are mainly centered on exploring the sequential relation among video clips and query words to reason the video and sentence query, neglecting the other intra-modal relations (e.g., semantic similarity among video clips and syntactic dependency among the query words). Towards this end, in this work, we propose a Multimodal Interaction Graph Convolutional Network (MIGCN), which jointly explores the complex intra-modal relations and inter-modal interactions residing in the video and sentence query to facilitate the understanding and semantic correspondence capture of the video and sentence query. In addition, we devise an adaptive context-aware localization method, where the context information is taken into the candidate moments and the multiscale fully connected layers are designed to rank and adjust the boundary of the generated coarse candidate moments with different lengths. Extensive experiments on Charades-STA and ActivityNet datasets demonstrate the promising performance and superior efficiency of our model.

show abstract