Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Song, Jifei; Yu, Qian; Song, Yi-Zhe; Xiang, Tao; Hospedales, Timothy M.

doi:10.1109/iccv.2017.592

Cited by 235 publications

(227 citation statements)

References 40 publications

Supporting

Mentioning

226

Contrasting

Unclassified

Order By: Relevance

“…As discussed in existing studies [44,42,6,5], CNNs may suffer from the sparsity of inputs (e.g., raster sketches), though they excel at building hierarchical representations of 2D inputs. Instead of struggling to estimate attention from binary images that contain limited information [34], we argue that additional cues, such as the temporal ordering and grouping information in vector sketches, are essential to learn reliable attention for strokes. In our method, we resort to RNNs for computing attention for each point in a vector sketch, and use our NLR module for in-network vector-to-raster conversion.…”

Section: Related Workmentioning

confidence: 98%

“…With a trained SVM, Schneider et al [31] qualitatively analyzed how stroke importance affects classification scores by iteratively removing each stroke from the corresponding raster sketch image. To automatically capture stroke importance during the learning process, researchers have attempted to adapt an attention mechanism in network design [34]. Attention mechanism has been widely used in many visual tasks, such as image classification [24,40,37,10], image caption [41,22] or Visual Question Answering (VQA) [25].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Sketch-R2CNN: An RNN-Rasterization-CNN Architecture for Vector Sketch Recognition

Zhang

Zheng

et al. 2021

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

Freehand sketching is a dynamic process where points are sequentially sampled and grouped as strokes for sketch acquisition on electronic devices. To recognize a sketched object, most existing methods discard such important temporal ordering and grouping information from human and simply rasterize sketches into binary images for classification. In this paper, we propose a novel singlebranch attentive network architecture RNN-Rasterization-CNN (Sketch-R2CNN for short) to fully leverage the dynamics in sketches for recognition. Sketch-R2CNN takes as input only a vector sketch with grouped sequences of points, and uses an RNN for stroke attention estimation in the vector space and a CNN for 2D feature extraction in the pixel space respectively. To bridge the gap between these two spaces in neural networks, we propose a neural line rasterization module to convert the vector sketch along with the attention estimated by RNN into a bitmap image, which is subsequently consumed by CNN. The neural line rasterization module is designed in a differentiable way to yield a unified pipeline for end-to-end learning. We perform experiments on existing large-scale sketch recognition benchmarks and show that by exploiting the sketch dynamics with the attention mechanism, our method is more robust and achieves better performance than the state-of-the-art methods.

show abstract

Section: Related Workmentioning

confidence: 98%

Section: Related Workmentioning

confidence: 99%

Sketch-R2CNN: An RNN-Rasterization-CNN Architecture for Vector Sketch Recognition

Zhang

Zheng

et al. 2021

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

show abstract

“…The hand-crafted techniques mostly work with Bag-of-Words representations of sketch and edge map of natural image on top of some off-the-shelf features, such as, SIFT [19], Gradient Field HOG [10], Histogram of Edge Local Orientations [25] or Learned Key Shapes [26]) etc. This domain shift issue is further addressed by crossdomain deep learning-based methods [27,37], where they have used classical ranking losses, such as, contrastive loss, triplet loss [32] or more elegant HOLEF loss [30] within a siamese like network. Based on the problem at hand, two separated tasks have been identified: (1) Fine-grained SBIR (FG-SBIR) aims to capture fine-grained similarities of sketch and photo [15,27,37] and (2) Coarse-grained SBIR (CG-SBIR) performs a instance level search across multiple object categories [38,10,11,31,38], which has received a lot of attention due to its importance.…”

Section: Related Workmentioning

confidence: 99%

“…grained matching [37,30,24], large-scale hashing [17,16], cross-modal attention [5,30] to name a few. However, a common bottleneck identified by almost all sketch researches is that of data scarcity.…”

Section: Introductionmentioning

confidence: 99%

Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval

Dey¹,

Riba²,

Dutta³

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

176

174

View full text Add to dashboard Cite

In this paper, we investigate the problem of zeroshot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330, 000 sketches and 204, 000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future research † .

show abstract

“…Our Semantic-Aware Knowledge prEservation (SAKE) preserves original domain knowledge of rich visual features (e.g., visual details of different subtypes of cars) which helps distinguishing the right photo candidates (e.g., SUV) from distractors (e.g., race car) in the unseen classes. neural networks into this field [22,45,21,43,37,33,30,44,42,39]. In the conventional setting, it is assumed that training and testing images are from the same set of object categories, in which scenario existing approaches achieved satisfying performance [22].…”

Section: Introductionmentioning

confidence: 99%

Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval

Liu

Xie

Wang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

104

View full text Add to dashboard Cite

Sketch-based image retrieval (SBIR) is widely recognized as an important vision problem which implies a wide range of real-world applications. Recently, research interests arise in solving this problem under the more realistic and challenging setting of zero-shot learning. In this paper, we investigate this problem from the viewpoint of domain adaptation which we show is critical in improving feature embedding in the zero-shot scenario. Based on a framework which starts with a pre-trained model on ImageNet and finetunes it on the training set of SBIR benchmark, we advocate the importance of preserving previously acquired knowledge, e.g., the rich discriminative features learned from Im-ageNet, to improve the model's transfer ability. For this purpose, we design an approach named Semantic-Aware Knowledge prEservation (SAKE), which fine-tunes the pretrained model in an economical way and leverages semantic information, e.g., inter-class relationship, to achieve the goal of knowledge preservation. Zero-shot experiments on two extended SBIR datasets, TU-Berlin and Sketchy, verify the superior performance of our approach. Extensive diagnostic experiments validate that knowledge preserved benefits SBIR in zero-shot settings, as a large fraction of the performance gain is from the more properly structured feature embedding for photo images. Code is available at: https://github.com/qliu24/SAKE.

show abstract

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Cited by 235 publications

References 40 publications

Sketch-R2CNN: An RNN-Rasterization-CNN Architecture for Vector Sketch Recognition

Sketch-R2CNN: An RNN-Rasterization-CNN Architecture for Vector Sketch Recognition

Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval

Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval

Contact Info

Product

Resources

About