Proceedings of the 14th ACM International Conference on Web Search and Data Mining 2021
DOI: 10.1145/3437963.3441814
|View full text |Cite
|
Sign up to set email alerts
|

Semi-Supervised Text Classification via Self-Pretraining

Abstract: We present a neural semi-supervised learning model termed Self-Pretraining. Our model is inspired by the classic self-training algorithm. However, as opposed to self-training, Self-Pretraining is threshold-free, it can potentially update its belief about previously labeled documents, and can cope with the semantic drift problem. Self-Pretraining is iterative and consists of two classifiers.In each iteration, one classifier draws a random set of unlabeled documents and labels them. This set is used to initializ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 60 publications
0
8
0
Order By: Relevance
“…For semi-supervised learning (Zhu, 2008;Van Engelen & Hoos, 2020;Karisani & Karisani, 2021), the main idea is to utilize a small number of labeled samples and a large amount of unlabeled samples to improve the performance of the learned hypothesis. One of the assumptions for semi-supervised learning is that the unlabeled examples hold the same distribution as that held by the labeled ones.…”
Section: Semi-supervised Learningmentioning
confidence: 99%
“…For semi-supervised learning (Zhu, 2008;Van Engelen & Hoos, 2020;Karisani & Karisani, 2021), the main idea is to utilize a small number of labeled samples and a large amount of unlabeled samples to improve the performance of the learned hypothesis. One of the assumptions for semi-supervised learning is that the unlabeled examples hold the same distribution as that held by the labeled ones.…”
Section: Semi-supervised Learningmentioning
confidence: 99%
“… 3 Initiatives relying on self-reporting of symptoms have included apps such as ‘COVID 19 symptom tracker’ ( https://covid.joinzoe.com/ ) in the UK, surveys disseminated via Facebook ( https://jpsm.umd.edu/research/facebook-%28covid%29-symptom-survey ), and analysis of posts on social media. 4 7 These initiatives may all provide useful complementary data to help populate modelling predictions of COVID-19 transmissions.…”
Section: Introductionmentioning
confidence: 99%
“…A social media mining approach has already been applied to tweets from users in the United States, China and Italy results. 4 8 This research has either simply used Twitter volume on COVID-19, or relied on the terms ‘cough’ and ‘fever’ or used synonyms for ‘COVID-19’ as a predictor with surprisingly good results. 4 8 …”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We use unlabeled data to transfer the knowledge from the classifier in each view to the classifier in the other view. Additionally, we use a finetuning technique to mitigate the impact of noisy pseudo-labels after the initialization (Karisani and Karisani, 2021). As straightforward as it is to implement, our model achieves the state-of-the-art performance in the largest publicly available ADR dataset, i.e., SMM4H dataset.…”
Section: Introductionmentioning
confidence: 99%