BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR

Chaudhuri, Ushasi; Ruchika, Chavan,; Banerjee, Biplab; Dutta, Anjan; Akata, Zeynep

doi:10.48550/arxiv.2201.06570

Cited by 2 publications

(4 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Competitors. We compare our method with several baselines, including ZSIH [50], CC-DG [40], DOODLE [16], SEM-PCYC [19], SAKE [34], SketchGCN [67], StyleGuide [20], PDFD [13], DSN [57], BDA-SketRet [8], SBTKNet [55], Sketch3T [44], TVT [54] and ViT-Ret/ViT-Vis [18] adapted by us. ViT-Ret means replacing the class token in ViT with a retrieval token used for matching; while ViT-Vis uses the visual tokens for matching.…”

Section: Category-level Zs-sbirmentioning

confidence: 99%

See 1 more Smart Citation

Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

Lin,

Li,

et al. 2023

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

This paper studies the problem of zero-short sketchbased image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants (inter-category, intra-category, and cross datasets) of ZS-SBIR with just one network ("everything"), and (ii) we would really like to understand how this sketch-photo matching operates ("explainable"). Our key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches -akin to the seasoned "bag-of-words" paradigm. Just with this change, we are able to achieve both of the aforementioned goals, with the added benefit of no longer requiring external semantic knowledge. Technically, ours is a transformer-based cross-modal network, with three novel components (i) a self-attention module with a learnable tokenizer to produce visual tokens that correspond to the most informative local regions, (ii) a cross-attention module to compute local correspondences between the visual tokens across two modalities, and finally (iii) a kernel-based relation network to assemble local putative matches and produce an overall similarity metric for a sketch-photo pair. Experiments show ours indeed delivers superior performances across all ZS-SBIR settings. The all important explainable goal is elegantly achieved by visualizing crossmodal token correspondences, and for the first time, via sketch to photo synthesis by universal replacement of all matched photo patches. Code and model are available at https://github.com/buptLinfy/ZSE-SBIR.

show abstract

Section: Category-level Zs-sbirmentioning

confidence: 99%

“…Zero-shot sketch-based image retrieval (ZS-SBIR) is a central problem to sketch understanding [8,16,19,20,28,34,50,54,55,57,60,67]. The zero-shot setting is largely driven by the prevailing data scarcity problem of human sketches [19,28,58] -they are much harder to acquire compared with photos.…”

Section: Introductionmentioning

confidence: 99%

Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

Lin,

Li,

et al. 2023

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…IV. RESULTS AND DISCUSSIONS We compare the performance of the proposed model with some of the existing state-of-the-art frameworks of [9]- [12], [14], [17], [18], [21], [34]. We also lay down the performances of some of the notable works in SBIR that uses the same datasets to show how the proposed model which solves a more challenging ZS-SBIR problem achieves comparable performance.…”

Section: Sketchy-ext Tu Berlin-extmentioning

confidence: 99%

“…On the other hand, some of the notable discriminative models involve [10], [16]- [18], to name a few. The Doo-dle2search [10] uses a triplet architecture and uses gradient reversal layers to enforce learning domain agnostic features from image and sketches.…”

Section: Introductionmentioning

confidence: 99%

Zero-Shot Sketch Based Image Retrieval using Graph Transformer

Sumrit¹,

Chaudhuri²,

Banerjee³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

The performance of a zero-shot sketch-based image retrieval (ZS-SBIR) task is primarily affected by two challenges. The substantial domain gap between image and sketch features needs to be bridged, while at the same time the side information has to be chosen tactfully. Existing literature has shown that varying the semantic side information greatly affects the performance of ZS-SBIR. To this end, we propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks which uses a novel graph transformer to preserve the topology of the classes in the semantic space and propagates the context-graph of the classes within the embedding features of the visual space. To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space. We also propose a novel compatibility loss that further aligns the two visual domains by bridging the domain gap of one class with respect to the domain gap of all other classes in the training set. Experimental results obtained on the extended Sketchy, TU-Berlin, and QuickDraw datasets exhibit sharp improvements over the existing state-ofthe-art methods in both ZS-SBIR and generalized ZS-SBIR.

show abstract

BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR

Cited by 2 publications

References 40 publications

Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

Zero-Shot Sketch Based Image Retrieval using Graph Transformer

Contact Info

Product

Resources

About