Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network

Zhang, Zhaolong; Zhang, Yuejie; Feng, Rui; Zhang, Tao; Fan, Weiguo

doi:10.1609/aaai.v34i07.6993

Cited by 57 publications

(37 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most existing ZS-SBIR methods follow the common space paradigm and pre-define class-prototypes with language models. Graph [Zhang et al, 2020;Shen et al, 2018], cycle consistency [Dutta and Akata, 2019;Deng et al, 2020] and content-style disentanglement [Dutta and Biswas, 2019]) are employed to learn the projection to map sketches/photos close to such prototypes. [Dutta and Akata, 2019] proposes a selection layer to refine the prototypes and reduce the dimensionality of retrieval features.…”

Section: Related Workmentioning

confidence: 99%

“…[Dutta and Akata, 2019] proposes a selection layer to refine the prototypes and reduce the dimensionality of retrieval features. Graph-based method [Zhang et al, 2020] adjust the language-based adjacency matrix with visual information. proposes a teacherstudent framework to preserve discriminative representations from ImageNet and coordinates the representations close to language-based prototypes.…”

Section: Related Workmentioning

confidence: 99%

“…Then, they learn to project both sketches and photos close to their corresponding class-prototypes. Based on this paradigm, various architectures (e.g., GANs [Dutta and Biswas, 2019], graph [Shen et al, 2018;Zhang et al, 2020], and cycle reconstruction [Dutta and Akata, 2019;Deng et al, 2020]) have been employed to estimate the projection. They expect that such learned mappings can leverage the side-information from language model to project unseen class sketches/photos close to their semantically similar seen classes, and thus the distances among the sketches and photos from the same category are minimized.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval

Wang

Shi

Chen

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Zero-shot sketch-based image retrieval (ZS-SBIR), which aims to retrieve photos with sketches under the zero-shot scenario, has shown extraordinary talents in real-world applications. Most existing methods leverage language models to generate class-prototypes and use them to arrange the locations of all categories in the common space for photos and sketches. Although great progress has been made, few of them consider whether such pre-defined prototypes are necessary for ZS-SBIR, where locations of unseen class samples in the embedding space are actually determined by visual appearance and a visual embedding actually performs better. To this end, we propose a novel Norm-guided Adaptive Visual Embedding (NAVE) model, for adaptively building the common space based on visual similarity instead of language-based pre-defined prototypes. To further enhance the representation quality of unseen classes for both photo and sketch modality, modality norm discrepancy and noisy label regularizer are jointly employed to measure and repair the modality bias of the learned common embedding. Experiments on two challenging datasets demonstrate the superiority of our NAVE over state-of-the-art competitors.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval

Wang

Shi

Chen

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…The utility of semantic space has been validated in various previously proposed frameworks, such as [9], [10], [19], [21], to name a few. The visual space is aligned with the semantic space to aid the training process and learn from the topology of the semantic space.…”

Section: Introductionmentioning

confidence: 99%

“…However, while [9], [10], [19] used the semantic space, they made the semantic space latent and learnable causing the network to eventually loose the classwise topology information as the network is trained for more number of epochs. To preserve the original topology of the semantic space, [21], [22] proposed using a graph convolution network (GCN) [23]. While in [22] the authors use a GCN directly on the semantic graph, in [21] the authors create a fully-connected graph whose edge weights correspond to the semantic distances and the node features comprise of classwise visual features.…”

Section: Introductionmentioning

confidence: 99%

Zero-Shot Sketch Based Image Retrieval using Graph Transformer

Sumrit¹,

Chaudhuri²,

Banerjee³

2022

Preprint

View full text Add to dashboard Cite

The performance of a zero-shot sketch-based image retrieval (ZS-SBIR) task is primarily affected by two challenges. The substantial domain gap between image and sketch features needs to be bridged, while at the same time the side information has to be chosen tactfully. Existing literature has shown that varying the semantic side information greatly affects the performance of ZS-SBIR. To this end, we propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks which uses a novel graph transformer to preserve the topology of the classes in the semantic space and propagates the context-graph of the classes within the embedding features of the visual space. To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space. We also propose a novel compatibility loss that further aligns the two visual domains by bridging the domain gap of one class with respect to the domain gap of all other classes in the training set. Experimental results obtained on the extended Sketchy, TU-Berlin, and QuickDraw datasets exhibit sharp improvements over the existing state-ofthe-art methods in both ZS-SBIR and generalized ZS-SBIR.

show abstract