2020
DOI: 10.1007/978-3-030-58565-5_36
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(32 citation statements)
references
References 45 publications
0
32
0
Order By: Relevance
“…This section will compare our method with several state of the art methods. Since our model belongs to the one-stage methods, we mainly compare it with one-stage methods, which are ABLR [32], ExCL [9], DEBUG [18], TMLGA [24], HVTG [5], VSLnet [34], GDP [4], DRN [33], FIAN [22] and VLG-Net [21]. To further illustrate the effect, we also give the score of some two-stages methods including CTRL [8], SLTA [12], ACRN [16], CBP [26] and 2D-TAN [35].…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…This section will compare our method with several state of the art methods. Since our model belongs to the one-stage methods, we mainly compare it with one-stage methods, which are ABLR [32], ExCL [9], DEBUG [18], TMLGA [24], HVTG [5], VSLnet [34], GDP [4], DRN [33], FIAN [22] and VLG-Net [21]. To further illustrate the effect, we also give the score of some two-stages methods including CTRL [8], SLTA [12], ACRN [16], CBP [26] and 2D-TAN [35].…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…The majority of existing methods for video grounding can be categorized into two families: 1) proposal-based methods [2,5,13,14,17,25,26,33,43,45,48,50,54,56,57,58], which All codes and models will be made available shortly. generate a bunch of proposals in advance and select the best match with target spans, and 2) proposal-free methods [6,7,8,15,29,31,36,42,45,52,53,55], which estimate start and end timestamps aligned to the given description directly. The proposal-based approaches generally show strong performance at the expense of prohibitive cost of proposal generation.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, [52] proposes obtaining contextualized activity representations in the video based on a language-conditioned message-passing algorithm where the activities are the composition of interactions between humans and objects in the scene. [53] also utilizes graph networks to model interactions between the objects and words as well as among the objects. These two jobs are similar to our model as they also utilize semantic features and devise graph operation in the network, but the organization and composition of their graphs are different from ours and they also fail to recognize the correspondence of objects in the time domain.…”
Section: Review Of Natural Language Video Localizationmentioning
confidence: 99%
“…While ExCL [48] and TMLGA [49] model the prediction of the span boundaries as the pure classification task. HVTG [53] and DORi [52] are the most similar work to ours as they also exploit the semantic feature of the scene. [53] utilizes graph networks to model the relationships of objects in the scene and the interactions between the objects and words.…”
Section: Experiments 431 Datasetsmentioning
confidence: 99%
See 1 more Smart Citation