2023
DOI: 10.48550/arxiv.2301.09506
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

OvarNet: Towards Open-vocabulary Object Attribute Recognition

Abstract: In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario. To achieve this goal, we make the following contributions: (i) we start with a naïve two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr. The candidate objects are first proposed with an offline RPN and later classified for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…Several studies have been conducted consecutively on learning multimodal structural knowledge, commencing with visual concept recognition as a basic task (Deng et al, 2009;Davoodi et al, 2023;Kamath et al, 2022), progressing to object grounding (Yu and Ballard, 2004;Shao et al, 2019;Krasin et al, 2017), object attribute detection (Chen et al, 2023a;Patil and Abhyankar, 2023;Bravo et al, 2022), object-object relation detection (Krishna et al, 2016), and finally visual event understanding (Yatskar et al, 2016(Yatskar et al, , 2017Cho et al, 2022;…”
Section: A Adaptation Of Downstream Tasksmentioning
confidence: 99%
“…Several studies have been conducted consecutively on learning multimodal structural knowledge, commencing with visual concept recognition as a basic task (Deng et al, 2009;Davoodi et al, 2023;Kamath et al, 2022), progressing to object grounding (Yu and Ballard, 2004;Shao et al, 2019;Krasin et al, 2017), object attribute detection (Chen et al, 2023a;Patil and Abhyankar, 2023;Bravo et al, 2022), object-object relation detection (Krishna et al, 2016), and finally visual event understanding (Yatskar et al, 2016(Yatskar et al, , 2017Cho et al, 2022;…”
Section: A Adaptation Of Downstream Tasksmentioning
confidence: 99%