Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/369
|View full text |Cite
|
Sign up to set email alerts
|

Textual Membership Queries

Abstract: Human labeling of data can be very time-consuming and expensive, yet, in many cases it is critical for the success of the learning process. In order to minimize human labeling efforts, we propose a novel active learning solution that does not rely on existing sources of unlabeled data. It uses a small amount of labeled data as the core set for the synthesis of useful membership queries (MQs) — unlabeled instances generated by an algorithm for human labeling. Our solution uses modification operators… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…Another well-known method is generation of new text using RNN [30] or GPT [12,18,36,42] or GAN [3]. In [44], they applied dependency based embeddings for word substitution to generate text while leveraging textual membership queries. Some other studies leveraged solutions such as shifting the position of words in a zero padded representation [30], synthetic minority oversampling, random over-and under-sampling, and AdaSYN [29], adding common misspelling of words to data and collecting tweets that contain swear words in conjunction with positive adjectives or racial and religious tweets [38], adding tweets with disgust and anger emotions from suspended accounts to the data [1], bootstrapping from another dataset; embedding based [13] or sentiment polarity based [8] methods.…”
Section: Related Studiesmentioning
confidence: 99%
“…Another well-known method is generation of new text using RNN [30] or GPT [12,18,36,42] or GAN [3]. In [44], they applied dependency based embeddings for word substitution to generate text while leveraging textual membership queries. Some other studies leveraged solutions such as shifting the position of words in a zero padded representation [30], synthetic minority oversampling, random over-and under-sampling, and AdaSYN [29], adding common misspelling of words to data and collecting tweets that contain swear words in conjunction with positive adjectives or racial and religious tweets [38], adding tweets with disgust and anger emotions from suspended accounts to the data [1], bootstrapping from another dataset; embedding based [13] or sentiment polarity based [8] methods.…”
Section: Related Studiesmentioning
confidence: 99%
“…This also implies that we need a labeller or oracle O, since newly generated points do not come with a label. For experimentation, however, the oracle can be replaced by a classifier [14]. Two more important things that need to be taken care of are when to stop, and how many initial labelled data to use, which we denote as the seed.…”
Section: Membership Query Synthesis and Related Workmentioning
confidence: 99%
“…The only other work we have found on textual MQS is the one of [14], where sentences are edited to perturb the examples and create new ones. To do so, words are replaced by semantically near substitutes, and these sentences are then labelled to see if the replacement made them change the class.…”
Section: Membership Query Synthesis and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Several works have proposed to perturbe the original example in the feature space [5][6][7]. While these methods are of great use in some domains, they may be inapplicable in other important domains, such as computer vision and NLP, where points in the feature space are incomprehensible to humans [29].…”
Section: Plausibiliy Of Generated Counterfactualsmentioning
confidence: 99%