Prior highly-tuned human parsing models tend to fit towards each dataset in a specific domain or with discrepant label granularity, and can hardly be adapted to other human parsing tasks without extensive re-training. In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at various levels of granularity. This poses many fundamental learning challenges, e.g. discovering underlying semantic structures among different label granularity, performing proper transfer learning across different image domains, and identifying and utilizing label redundancies across related tasks.To address these challenges, we propose a new universal human parsing agent, named "Graphonomy", which incorporates hierarchical graph transfer learning upon the conventional parsing network to encode the underlying label semantic structures and propagate relevant semantic information. In particular, Graphonomy first learns and propagates compact high-level graph representation among the labels within one dataset via Intra-Graph Reasoning, and then transfers semantic information across multiple datasets via Inter-Graph Transfer. Various graph transfer dependencies (e.g., similarity, linguistic knowledge) between different datasets are analyzed and encoded to enhance graph transfer capability. By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity. Experimental results show Graphonomy effectively achieves the state-of-the-art results on three human parsing benchmarks as well as advantageous universal human parsing performance.
Accurately classifying colorectal polyps, or differentiating malignant from benign ones, has a significant clinical impact on early detection and identifying optimal treatment of colorectal cancer. Convolution neural network (CNN) has shown great potential in recognizing different objects (e.g. human faces) from multiple slice (or color) images, a task similar to the polyp differentiation, given a large learning database. This study explores the potential of CNN learning from multiple slice (or feature
Matching clothing images from customers and online shopping stores has rich applications in E-commerce. Existing algorithms encoded an image as a global feature vector and performed retrieval with the global representation. However, discriminative local information on clothes are submerged in this global representation, resulting in suboptimal performance. To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales. The similarity pyramid is represented by a Graph of similarity, where nodes represent similarities between clothing components at different scales, and the final matching score is obtained by message passing along edges. In GRNet, graph reasoning is solved by training a graph convolutional network, enabling to align salient clothing components to improve clothing retrieval. To facilitate future researches, we introduce a new benchmark FindFashion, containing rich annotations of bounding boxes, views, occlusions, and cropping. Extensive experiments show that GRNet obtains new state-of-the-art results on two challenging benchmarks, e.g. pushing the top-1, top-20, and top-50 accuracies on DeepFashion to 26%, 64%, and 75% (i.e. 4%, 10%, and 10% absolute improvements), outperforming competitors with large margins. On FindFashion, GRNet achieves considerable improvements on all empirical settings.
The automatic style translation of Chinese characters (CH-Char) is a challenging problem. Different from English or general artistic style transfer, Chinese characters contain a large number of glyphs with the complicated content and characteristic style. Early methods on CH-Char synthesis are inefficient and require manual intervention. Recently some GAN-based methods are proposed for font generation. The supervised GAN-based methods require numerous image pairs, which is difficult for many chirography styles. In addition, unsupervised methods often cause the blurred and incorrect strokes. Therefore, in this work, we propose a three-stage Generative Adversarial Network (GAN) architecture for multi-chirography image translation, which is divided into skeleton extraction, skeleton transformation and stroke rendering with unpaired training data. Specifically, we first propose a fast skeleton extraction method (ENet). Secondly, we utilize the extracted skeleton and the original image to train a GAN model, RNet (a stroke rendering network), to learn how to render the skeleton with stroke details in target style. Finally, the pre-trained model RNet is employed to assist another GAN model, TNet (a skeleton transformation network), to learn to transform the skeleton structure on the unlabeled skeleton set. We demonstrate the validity of our method on two chirography datasets we established.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.