2021
DOI: 10.1109/tip.2021.3120038
|View full text |Cite
|
Sign up to set email alerts
|

Text-Based Localization of Moments in a Video Corpus

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(3 citation statements)
references
References 59 publications
0
3
0
Order By: Relevance
“…Many follow-ups further boost the zero-shot transferable ability, e.g., CoOp [57], CLIP-Adapter [14], and Tip-adapter [55]. In video domains, similar idea has also been explored for transferable representation learning [26], and text based action localization [32]. CLIP is used recently in action recognition [43] and TAD [19,30].…”
Section: Related Workmentioning
confidence: 99%
“…Many follow-ups further boost the zero-shot transferable ability, e.g., CoOp [57], CLIP-Adapter [14], and Tip-adapter [55]. In video domains, similar idea has also been explored for transferable representation learning [26], and text based action localization [32]. CLIP is used recently in action recognition [43] and TAD [19,30].…”
Section: Related Workmentioning
confidence: 99%
“…Since then, many follow-ups have been proposed, including improved training strategy (e.g., CoOp [54], CLIP-Adapter [12], Tip-adapter [50]). In video domains, similar idea has also been explored for transferable representation learning [24], text based action localization [32]. CLIP has also been used very recently in action recognition (e.g., ActionCLIP [41]) and TAD [17].…”
Section: Related Workmentioning
confidence: 99%
“…A few recent methods [38,199,215,[252][253][254] tackle the VCMR problems. Zhang et al [199] develop a hierarchical multi-modal encoder to learn multimodal interactions at both coarse-and fine-grained granularities.…”
Section: Video Corpus Moment Retrievalmentioning
confidence: 99%