Multi-label Zero-Shot Learning with Structured Knowledge Graphs

Lee, Chung-Wei; Fang, Wei; Yeh, Chih‐Kuan; Wang, Yu-Chiang Frank

doi:10.1109/cvpr.2018.00170

Cited by 277 publications

(173 citation statements)

References 39 publications

Supporting

Mentioning

173

Contrasting

Order By: Relevance

“…Quantitative results are reported in Table 1. We compare with state-of-the-art methods, including CNN-RNN [28], RNN-Attention [29], Order-Free RNN [1], ML-ZSL [15], SRN [36], Multi-Evidence [6], etc. For the proposed ML-GCN, we report the results based on the binary correlation matrix ("ML-GCN (Binary)") and the re-weighted correlation matrix ("ML-GCN (Re-weighted)"), respectively.…”

Section: Comparisons With State-of-the-artsmentioning

confidence: 99%

Multi-Label Image Recognition With Graph Convolutional Networks

Chen

Wei

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

974

456

View full text Add to dashboard Cite

The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important dependencies, we propose a multi-label classification model based on Graph Convolutional Network (GCN). The model builds a directed graph over the object labels, where each node (label) is represented by word embeddings of a label, and GCN is learned to map this label graph into a set of inter-dependent object classifiers. These classifiers are applied to the image descriptors extracted by another sub-net, enabling the whole network to be end-to-end trainable. Furthermore, we propose a novel re-weighted scheme to create an effective label correlation matrix to guide information propagation among the nodes in GCN. Experiments on two multi-label image recognition datasets show that our approach obviously outperforms other existing state-of-the-art methods. In addition, visualization analyses reveal that the classifiers learned by our model maintain meaningful semantic topology.

show abstract

Section: Comparisons With State-of-the-artsmentioning

confidence: 99%

Multi-Label Image Recognition With Graph Convolutional Networks

Chen

Wei

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

974

456

View full text Add to dashboard Cite

show abstract

“…It has been extensively studied to incorporate prior knowledge to aid numerous vision tasks [20,8,15,7,2,18]. For example, Marino et al [20] constructed a knowledge graph based on the WordNet [22] and the Visual Genome dataset [14], and learned the representation of this graph to enhance image feature representation to promote multilabel recognition.…”

Section: Knowledge Representationmentioning

confidence: 99%

Knowledge-Embedded Routing Network for Scene Graph Generation

Chen¹,

Yu²,

Chen³

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

394

332

View full text Add to dashboard Cite

To understand a scene in depth not only involves locating/recognizing individual objects, but also requires to infer the relationships and interactions among them. However, since the distribution of real-world relationships is seriously unbalanced, existing methods perform quite poorly for the less frequent relationships. In this work, we find that the statistical correlations between object pairs and their relationships can effectively regularize semantic space and make prediction less ambiguous, and thus well address the unbalanced distribution issue. To achieve this, we incorporate these statistical correlations into deep neural networks to facilitate scene graph generation by developing a Knowledge-Embedded Routing Network. More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions. Extensive experiments on the large-scale Visual Genome dataset demonstrate the superiority of the proposed method over current state-ofthe-art competitors.

show abstract

“…Taking advantage of geometric deep learning [19,20], our Graphonomy simply integrates two cooperative modules for graph transfer learning. First, we introduce Intra-Graph Reasoning to progressively refine graph representations within the same graph structure, in which each graph node is responsible for segmenting out regions of one semantic part in a dataset.…”

Section: Inter-graph Connectionmentioning

confidence: 99%

“…Knowledge-guided Graph Reasoning. Many research efforts recently model domain knowledge as a graph for mining correlations among labels or objects in images, which has been proved effective in many tasks [5,19,20,29,35]. For example, Chen et al [5] leveraged local regionbased reasoning and global reasoning to facilitate object detection.…”

Section: Related Workmentioning

confidence: 99%

Graphonomy: Universal Human Parsing via Graph Transfer Learning

Gong

Gao

Liang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

182

105

View full text Add to dashboard Cite

Prior highly-tuned human parsing models tend to fit towards each dataset in a specific domain or with discrepant label granularity, and can hardly be adapted to other human parsing tasks without extensive re-training. In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at various levels of granularity. This poses many fundamental learning challenges, e.g. discovering underlying semantic structures among different label granularity, performing proper transfer learning across different image domains, and identifying and utilizing label redundancies across related tasks.To address these challenges, we propose a new universal human parsing agent, named "Graphonomy", which incorporates hierarchical graph transfer learning upon the conventional parsing network to encode the underlying label semantic structures and propagate relevant semantic information. In particular, Graphonomy first learns and propagates compact high-level graph representation among the labels within one dataset via Intra-Graph Reasoning, and then transfers semantic information across multiple datasets via Inter-Graph Transfer. Various graph transfer dependencies (e.g., similarity, linguistic knowledge) between different datasets are analyzed and encoded to enhance graph transfer capability. By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity. Experimental results show Graphonomy effectively achieves the state-of-the-art results on three human parsing benchmarks as well as advantageous universal human parsing performance.

show abstract

Multi-label Zero-Shot Learning with Structured Knowledge Graphs

Cited by 277 publications

References 39 publications

Multi-Label Image Recognition With Graph Convolutional Networks

Multi-Label Image Recognition With Graph Convolutional Networks

Knowledge-Embedded Routing Network for Scene Graph Generation

Graphonomy: Universal Human Parsing via Graph Transfer Learning

Contact Info

Product

Resources

About