2022
DOI: 10.48550/arxiv.2203.08481
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Abstract: Visual grounding, i.e., localizing objects in images according to natural language queries, is an important topic in visual language understanding. The most effective approaches for this task are based on deep learning, which generally require expensive manually labeled image-query or patch-query pairs. To eliminate the heavy dependence on human annotations, we present a novel method, named Pseudo-Q, to automatically generate pseudo language queries for supervised training. Our method leverages an off-the-shel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 50 publications
(137 reference statements)
0
1
0
Order By: Relevance
“…Vision-Language Models. Recent years have witnessed the rapid development of vision-language models [25,8,33,24,13,34,27,30]. Those works usually first pre-train a neural network on a large-scale image-text dataset and then finetune the models for solving specific vision-language tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Vision-Language Models. Recent years have witnessed the rapid development of vision-language models [25,8,33,24,13,34,27,30]. Those works usually first pre-train a neural network on a large-scale image-text dataset and then finetune the models for solving specific vision-language tasks.…”
Section: Related Workmentioning
confidence: 99%