Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

Chen, Tianshui; Lin, Liang; Hui, Xiaolu; Chen, Riquan; Wu, Hefeng

doi:10.1109/tpami.2020.3025814

Cited by 141 publications

(72 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In literature, a large number of traditional approaches have been proposed and they can be divided into exemplarbased methods [43,18] and regression-based methods [44]. In the past decade, deep neural networks (DNN) have achieved great success in various tasks [45,46,47,48,49,50], and many researchers have also applied DNN for sketch synthesis [51,52,53,54,55,56,33]. For example, Zhang et al [13] developed an end-to-end fully convolutional network to model the mapping between photos and sketches.…”

Section: Related Workmentioning

confidence: 99%

Unconstrained Face Sketch Synthesis via Perception-Adaptive Network and A New Benchmark

Lin¹,

Liu²,

Wu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Face sketch generation has attracted much attention in the field of visual computing. However, existing methods either are limited to constrained conditions or heavily rely on various preprocessing steps to deal with in-the-wild cases. In this paper, we argue that accurately perceiving facial region and facial components is crucial for unconstrained sketch synthesis. To this end, we propose a novel Perception-Adaptive Network (PANet), which can generate high-quality face sketches under unconstrained conditions in an end-to-end scheme. Specifically, our PANet is composed of: i) a Fully Convolutional Encoder for hierarchical feature extraction, ii) a Face-Adaptive Perceiving Decoder for extracting potential facial region and handling face variations, and iii) a Component-Adaptive Perceiving Module for facial component aware feature representation learning. To facilitate further researches of unconstrained face sketch synthesis, we introduce a new benchmark termed WildSketch, which contains 800 pairs of face photo-sketch with large variations in pose, expression, ethnic origin, background, and illumination. Extensive experiments demonstrate that the proposed method is capable of achieving state-of-the-art performance under both constrained and unconstrained conditions. Our source codes and the WildSketch benchmark would be resealed on the project page http://lingboliu.com/unconstrained_face_sketch.html.

show abstract

Section: Related Workmentioning

confidence: 99%

Unconstrained Face Sketch Synthesis via Perception-Adaptive Network and A New Benchmark

Lin¹,

Liu²,

Wu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It relies on a data augmentation strategy which generates synthesized feature vectors via label-set operations. KGGR (Chen et al 2020) uses a GCN to take label dependencies into account, where labels are modelled as nodes and two nodes are connected if the corresponding labels tend to co-occur. The strength of these la-bel dependencies is normally estimated from co-occurrence statistics, but for labels with limited training data, dependency strength is instead estimated based on GloVe word vectors (Pennington, Socher, and Manning 2014).…”

Section: Multi-label Few-shot Image Classificationmentioning

confidence: 99%

“…In ML-FSIC, this strategy is difficult to adopt, since each image may have multiple labels. The idea of setting N = |C base | during training and N = |C novel | during testing conforms to the strategy that was used by Alfassy et al (2019) and Chen et al (2020). However, Alfassy et al (2019) fix the number of training examples per label as K, with K ∈ {1, 5}, which has two important shortcomings.…”

Section: Problem Settingmentioning

confidence: 99%

Inferring Prototypes for Multi-Label Few-Shot Image Classification with Word Vector Guided Attention

Ke¹,

Zhang²,

Hou³

et al. 2021

Preprint

View full text Add to dashboard Cite

Multi-label few-shot image classification (ML-FSIC) is the task of assigning descriptive labels to previously unseen images, based on a small number of training examples. A key feature of the multi-label setting is that images often have multiple labels, which typically refer to different regions of the image. When estimating prototypes, in a metric-based setting, it is thus important to determine which regions are relevant for which labels, but the limited amount of training data makes this highly challenging. As a solution, in this paper, we propose to use word embeddings as a form of prior knowledge about the meaning of the labels. In particular, visual prototypes are obtained by aggregating the local feature maps of the support images, using an attention mechanism that relies on the label embeddings. As an important advantage, our model can infer prototypes for unseen labels without the need for fine-tuning any model parameters, which demonstrates its strong generalization abilities. Experiments on COCO and PASCAL VOC furthermore show that our model substantially improves the current state-of-the-art.

show abstract

“…Multi-label image recognition receives increasing attention (Wei et al 2016;Chen et al 2020) since it is more practical and necessary than its single-label counterpart. To solve this task, lots of efforts are dedicated to discovering discriminative local regions for feature enhancement by object proposal algorithms (Wei et al 2016;Yang et al 2016) or visual attention mechanisms (Ba, Mnih, and Kavukcuoglu 2014;Chen et al 2018b).…”

Section: Related Workmentioning

confidence: 99%

“…Recently, lots of efforts (Chen et al 2019c(Chen et al ,a, 2020 are dedicated to the task of multi-label image recognition as it benefits various applications ranging from content-based image retrieval and recommendation systems to surveillance systems and assistive robots. Despite achieving impressive progress, current leading algorithms (Chen et al 2019c(Chen et al ,a, 2020 introduce data-hungry deep convolutional networks (He et al 2016;Simonyan and Zisserman 2015) to learn discriminative features, and thus they depend on collecting large-scale clean and complete multi-label datasets. However, it is very time-consuming to collect a consistent and exhaustive list of labels for every image, making collecting clean and complete multi-label annotations more diffi- Figure 1: Two examples of images with partial labels (unknown labels are highlighted in red).…”

Section: Introductionmentioning

confidence: 99%

Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

Chen¹,

Pu²,

Wu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Multi-label image recognition is a fundamental yet practical task because real-world images inherently possess multiple semantic labels. However, it is difficult to collect large-scale multi-label annotations due to the complexity of both the input images and output label spaces. To reduce the annotation cost, we propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels, i.e., merely some labels are known while other labels are missing (also called unknown labels) per image. The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations to transfer knowledge of known labels to generate pseudo labels for unknown labels. Specifically, an intraimage semantic transfer module learns image-specific label co-occurrence matrix and maps the known labels to complement unknown labels based on this matrix. Meanwhile, a cross-image transfer module learns category-specific feature similarities and helps complement unknown labels with high similarities. Finally, both known and generated labels are used to train the multi-label recognition models. Extensive experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms. Codes are available at https://github.com/HCPLab-SYSU/HCP-MLR-PL.

show abstract

Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

Cited by 141 publications

References 60 publications

Unconstrained Face Sketch Synthesis via Perception-Adaptive Network and A New Benchmark

Unconstrained Face Sketch Synthesis via Perception-Adaptive Network and A New Benchmark

Inferring Prototypes for Multi-Label Few-Shot Image Classification with Word Vector Guided Attention

Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

Contact Info

Product

Resources

About