2019 IEEE 13th International Conference on Semantic Computing (ICSC) 2019
DOI: 10.1109/icosc.2019.8665546
|View full text |Cite
|
Sign up to set email alerts
|

The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification

Abstract: Annotation of training data is the major bottleneck in the creation of text classification systems. Active learning is a commonly used technique to reduce the amount of training data one needs to label. A crucial aspect of active learning is determining when to stop labeling data. Three potential sources for informing when to stop active learning are an additional labeled set of data, an unlabeled set of data, and the training data that is labeled during the process of active learning. To date, no one has comp… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(22 citation statements)
references
References 25 publications
0
19
0
Order By: Relevance
“…A number of stopping criterion for active learning have been proposed, [77][78][79][80][81][82] but in this work we monitor and define convergence using the stabilizing predictions (SP) method that evaluates performance based on unlabeled data 81 and the performance difference (PD) method that considers the labeled examples. 83 The SP method examines the predictions of consecutive models at each iteration of the active learning procedure on a randomly selected set of 500 points, called the stop set, which is held constant throughout the active learning. We measure the difference in the regression predictions between subsequent rounds using the average Bhattacharyya distance D B 84 between the posterior of consecutive GPR models over the stop set.…”
Section: Stop Criteriamentioning
confidence: 99%
“…A number of stopping criterion for active learning have been proposed, [77][78][79][80][81][82] but in this work we monitor and define convergence using the stabilizing predictions (SP) method that evaluates performance based on unlabeled data 81 and the performance difference (PD) method that considers the labeled examples. 83 The SP method examines the predictions of consecutive models at each iteration of the active learning procedure on a randomly selected set of 500 points, called the stop set, which is held constant throughout the active learning. We measure the difference in the regression predictions between subsequent rounds using the average Bhattacharyya distance D B 84 between the posterior of consecutive GPR models over the stop set.…”
Section: Stop Criteriamentioning
confidence: 99%
“…On the other hand, an essential aspect to consider in AL is determining a criterion for stopping the learning process. Some of the stopping criteria analyze the cost of obtaining new labels, set a maximum performance value for the classifier or training sample size, or analyze the quality of the examples in the datasets [12], [31]- [35]. One approach to a stopping criteria method was proposed by Bloodgood and Vijay-Shanker considering the unlabeled dataset [34] The method tests the new models obtained in consecutive iterations of AL in a separate dataset without labels to check if the predictions have stabilized.…”
Section: Related Workmentioning
confidence: 99%
“…Also, to decide on the stopping criteria for AL, we examined the learning curves and stopped the process when the classifier performance shows no improvement with additional iterations [17]. We use λ= 0.0001 as a threshold of performance differences and stop the experiments when the mean of performance differences does not exceed λ for a successive number of iterations.…”
Section: Experiments Settingsmentioning
confidence: 99%