Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data

Pal, Arghya; Balasubramanian, Vineeth N

doi:10.1109/cvpr.2018.00168

Cited by 10 publications

(8 citation statements)

References 45 publications

(67 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, [21] aims to reduce the computational cost and proposes a closed-formed solution for training the label model. [15,16,[41][42][43] apply DP to computer vision. Concretely, [16,42,43] heavily rely on the pretrained models.…”

Section: Data Programmingmentioning

confidence: 99%

“…[15,16,[41][42][43] apply DP to computer vision. Concretely, [16,42,43] heavily rely on the pretrained models. [41] combines crowdsourcing, data augmentation, and DP to create weak labels for image classification.…”

Section: Data Programmingmentioning

confidence: 99%

“…In Step 1 of our method, LFs are exploited to generate noisy labels for each unlabeled image. In previous DP works for computer vision, LFs are built via external image-agnostic knowledge [15] or pretrained models [16,42,43]. However, it is difficult to explicitly describe the rules of image classification.…”

Section: Labeling Functionmentioning

confidence: 99%

See 2 more Smart Citations

DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples

Ding

Zhang

et al. 2021

Preprint

View full text Add to dashboard Cite

The scarcity of labeled data is a critical obstacle to deep learning. Semi-supervised learning (SSL) provides a promising way to leverage unlabeled data by pseudo labels. However, when the size of labeled data is very small (say a few labeled samples per class), SSL performs poorly and unstably, possibly due to the low quality of learned pseudo labels. In this paper, we propose a new SSL method called DP-SSL that adopts an innovative data programming (DP) scheme to generate probabilistic labels for unlabeled data. Different from existing DP methods that rely on human experts to provide initial labeling functions (LFs), we develop a multiple-choice learning (MCL) based approach to automatically generate LFs from scratch in SSL style. With the noisy labels produced by the LFs, we design a label model to resolve the conflict and overlap among the noisy labels, and finally infer probabilistic labels for unlabeled samples. Extensive experiments on four standard SSL benchmarks show that DP-SSL can provide reliable labels for unlabeled data and achieve better classification performance on test sets than existing SSL methods, especially when only a small number of labeled samples are available. Concretely, for CIFAR-10 with only 40 labeled samples, DP-SSL achieves 93.82% annotation accuracy on unlabeled data and 93.46% classification accuracy on test data, which are higher than the SOTA results.

show abstract

Section: Data Programmingmentioning

confidence: 99%

Section: Data Programmingmentioning

confidence: 99%

See 1 more Smart Citation

DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples

Ding

Zhang

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…CrowdGame [19] proposes a method for constructing labeling functions for entity resolution on structured data. Adversarial data programming [23] proposes a GAN-based framework for labeling with labeling function results and claims to be better than Snorkel-based approaches. In comparison, Inspector Gadget solves the different problem of partially analyzing large images.…”

Section: Related Workmentioning

confidence: 99%

Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

Heo,

Roh,

Hwang

et al. 2020

Preprint

View full text Add to dashboard Cite

As machine learning for images becomes democratized in the Software 2.0 era, one of the serious bottlenecks is securing enough labeled data for training. This problem is especially critical in a manufacturing setting where smart factories rely on machine learning for product quality control by analyzing industrial images. Such images are typically large and may only need to be partially analyzed where only a small portion is problematic (e.g., identifying defects on a surface). Since manual labeling these images is expensive, weak supervision is an attractive alternative where the idea is to generate weak labels that are not perfect, but can be produced at scale. Data programming is a recent paradigm in this category where it uses human knowledge in the form of labeling functions and combines them into a generative model. Data programming has been successful in applications based on text or structured data and can also be applied to images usually if one can find a way to convert them into structured data. In this work, we expand the horizon of data programming by directly applying it to images without this conversion, which is a common scenario for industrial applications. We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification. We perform experiments on real industrial image datasets and show that Inspector Gadget obtains better accuracy than state-of-the-art techniques: Snuba, GOGGLES, and self-learning baselines using convolutional neural networks (CNNs) without pre-training.

show abstract

“…The dependence on large-scale annotated data has become the main bottleneck of progress in the use of deep learning. Because it is expensive to obtain enough annotated data [2]. In order to tackle such an unseen image recognition issue, generalized zero-shot learning (GZSL) is now extensively researched in some applications, such as autonomous object discovery system [3].…”

Section: Introductionmentioning

confidence: 99%

FLPD-GANS: Fake License Plate Discrimination GANS for Generalized Zero-Shot Learning

2020

Proceedings of 2020 the 10th International Workshop on Computer Science and Engineering

View full text Add to dashboard Cite

Most current generalized zero-shot learning (GZSL) methods need sufficient labels and other auxiliary information to obtain great results. In this paper, we propose Fake License Plate Discrimination GANs (FLPD-GANs) and introduce the first publicly available New Energy License Plate (NELP) image dataset named CCPONECD. Applied in the license plate (LP) image binary classification task, FLPD-GANs only need a binary label for training and can address the strong bias problem in GZSL tasks. CCPONECD contains nearly 7k unique new energy vehicles images and provides detailed LP vertex location annotations. In our work, the seen class is only real NELP image and the unseen class is manufactured fake NELP image. Trained with merely real NELP images, our FLPD-GANs can greatly discriminate between real and fake NELP images. Extensive experiments demonstrate that our FLPD-GANs model has 97.7% accuracy and performs well in NELP image discrimination for GZSL task.

show abstract

Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data

Cited by 10 publications

References 45 publications

DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples

DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples

Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

FLPD-GANS: Fake License Plate Discrimination GANS for Generalized Zero-Shot Learning

Contact Info

Product

Resources

About