2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01329
|View full text |Cite
|
Sign up to set email alerts
|

Disentangling Visual Embeddings for Attributes and Objects

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(18 citation statements)
references
References 35 publications
0
18
0
Order By: Relevance
“…We evaluate the two sub-tasks in both box-given and boxfree settings on COCO for category detection and VAW, LSA, and OVAD for attribute prediction. Specifically, the box-given setting is widely used in attribute prediction and object recognition communities [8,29,38,40], where the ground-truth bounding box annotations are assumed to be available for all objects, and the protocol only evaluates object category classification and multi-label attribute classification with mAP metric; In contrast, the box-free setting favors a more challenging problem, as the model is also required to simultaneously localise the objects, and classify the semantic category and attributes.…”
Section: Evaluation Protocol and Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluate the two sub-tasks in both box-given and boxfree settings on COCO for category detection and VAW, LSA, and OVAD for attribute prediction. Specifically, the box-given setting is widely used in attribute prediction and object recognition communities [8,29,38,40], where the ground-truth bounding box annotations are assumed to be available for all objects, and the protocol only evaluates object category classification and multi-label attribute classification with mAP metric; In contrast, the box-free setting favors a more challenging problem, as the model is also required to simultaneously localise the objects, and classify the semantic category and attributes.…”
Section: Evaluation Protocol and Metricsmentioning
confidence: 99%
“…In the literature, numerous work has shown that understanding the objects' attributes can greatly facilitate object recognition and detection, even with few or no examples of visual objects [6,18,25,43,53], for example, Farhadi et al proposed to shift the goal of object recognition from 'naming' to 'description', which allows naming familiar objects with attributes, but also to say something about unfamiliar objects ("hairy and four-legged", not just "unknown") [6]; Lampert et al considered the open-set object recognition, that aims to recognise objects by humanspecified high-level description, e.g., arbitrary semantic attributes, like shape, color, or even geographic information, instead of training images [18]. However, the problem considered in these seminal work tends to be a simplification from today's standard, for example, attribute classification are often trained and evaluated on object-centric images under the close-set scenario, i.e., assuming the bounding boxes/segmentation masks are given [13,29,38], or sometimes even the object category are known as a prior [26,29].…”
Section: Introductionmentioning
confidence: 99%
“…Li et al [15] disentangle attributes and objects with reversed attention. The second strategy directly predicts compositions by aligning images and textual labels in a shared space and searching for most similar compositions [13,26,35,42]. For example, Nagarajan et al [26] build a composition space by simulating all the visual changes of attributes performed on objects.…”
Section: Related Workmentioning
confidence: 99%
“…In compositional zero-shot learning, Yang et al [25] disentangle images into states and objects based on the causal effects in compositions. Saini et al [28] apply a state/object affinity network to disentangle the same states or objects from image pairs contrastively. Zhang et al [27] reconsider CZSL as an out-of-distribution generalization problem and use domain alignment in the gradient level to disentangle images into object-invariant and attribute-invariant features.…”
Section: Knowledge Disentanglement In Zero-shot Learningmentioning
confidence: 99%
“…Simple primitives in images are entangled visually [25]. Most current works [16], [19], [25], [26], [27], [28] view states or objects as the bases. They simulate the visual changes on bases caused by contextuality, inferring the possible entangled embedding.…”
Section: Introductionmentioning
confidence: 99%