Siliang Tang scite author profile

In this paper, we focus on the task query-based video localization, i.e., localizing a query in a long and untrimmed video. The prevailing solutions for this problem can be grouped into two categories: i) Top-down approach: It pre-cuts the video into a set of moment candidates, then it does classification and regression for each candidate; ii) Bottom-up approach: It injects the whole query content into each video frame, then it predicts the probabilities of each frame as a ground truth segment boundary (i.e., start or end). Both two frameworks have respective shortcomings: the top-down models suffer from heavy computations and they are sensitive to the heuristic rules, while the performance of bottom-up models is behind the performance of top-down counterpart thus far. However, we argue that the performance of bottom-up framework is severely underestimated by current unreasonable designs, including both the backbone and head network. To this end, we design a novel bottom-up model: Graph-FPN with Dense Predictions (GDP). For the backbone, GDP firstly generates a frame feature pyramid to capture multi-level semantics, then it utilizes graph convolution to encode the plentiful scene relationships, which incidentally mitigates the semantic gaps in the multi-scale feature pyramid. For the head network, GDP regards all frames falling in the ground truth segment as the foreground, and each foreground frame regresses the unique distances from its location to bi-directional boundaries. Extensive experiments on two challenging query-based video localization tasks (natural language video localization and video relocalization), involving four challenging benchmarks (TACoS, Charades-STA, ActivityNet Captions, and Activity-VRL), have shown that GDP surpasses the state-of-the-art top-down models.

show abstract

Sparse Multi-Modal Hashing

Yang

et al. 2014

IEEE Trans. Multimedia

130

View full text Add to dashboard Cite

Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis

Kong

Xie

et al. 2021

Engineering

View full text Add to dashboard Cite

Learning Dynamic Context Augmentation for Global Entity Linking

Yang¹,

Gu²,

Lin³

et al. 2019

View full text Add to dashboard Cite

Despite of the recent success of collective entity linking (EL) methods, these "global" inference methods may yield sub-optimal results when the "all-mention coherence" assumption breaks, and often suffer from high computational cost at the inference stage, due to the complex search space. In this paper, we propose a simple yet effective solution, called Dynamic Context Augmentation (DCA), for collective EL, which requires only one pass through the mentions in a document. DCA sequentially accumulates context information to make efficient, collective inference, and can cope with different local EL models as a plugand-enhance module. We explore both supervised and reinforcement learning strategies for learning the DCA model. Extensive experiments 1 show the effectiveness of our model with different learning settings, base models, decision orders and attention mechanisms.

show abstract

NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results

Abdelhamed¹,

Afifi²,

Timofte³

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Siliang Tang

Rethinking the Bottom-Up Framework for Query-Based Video Localization

Sparse Multi-Modal Hashing

Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis

Learning Dynamic Context Augmentation for Global Entity Linking

NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results

Contact Info

Product

Resources

About