Proceedings of the Fourteenth Workshop on Semantic Evaluation 2020
DOI: 10.18653/v1/2020.semeval-1.187
|View full text |Cite
|
Sign up to set email alerts
|

ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether Self-Training Helps Them

Abstract: This paper presents the winning system for the propaganda Technique Classification (TC) task and the second-placed system for the propaganda Span Identification (SI) task. The purpose of the TC task was to identify an applied propaganda technique given propaganda text fragment. The goal of SI task was to find specific text fragments which contain at least one propaganda technique. Both of the developed solutions used semi-supervised learning technique of selftraining. Interestingly, although CRF is barely used… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(15 citation statements)
references
References 9 publications
0
15
0
Order By: Relevance
“…Semi-supervised learning (Zhu and Goldberg, 2009) is a widely known training paradigm where a model is first trained on a human labelled dataset and the model is further used to extend the training set by automatically annotating the unlabelled dataset. Following previous studies (Thakur et al, 2021b;Jurkiewicz et al, 2020), we initially start with training on the original training set and then for all the generated unlabelled document pairs, we use the previously trained model for inference to get the similarity scores for the new synthetic document pairs. Finally, we train our entity-enriched Siamese Transformer in a semi-supervised fashion on both the complete augmented training set.…”
Section: Semi-supervised Learningmentioning
confidence: 99%
“…Semi-supervised learning (Zhu and Goldberg, 2009) is a widely known training paradigm where a model is first trained on a human labelled dataset and the model is further used to extend the training set by automatically annotating the unlabelled dataset. Following previous studies (Thakur et al, 2021b;Jurkiewicz et al, 2020), we initially start with training on the original training set and then for all the generated unlabelled document pairs, we use the previously trained model for inference to get the similarity scores for the new synthetic document pairs. Finally, we train our entity-enriched Siamese Transformer in a semi-supervised fashion on both the complete augmented training set.…”
Section: Semi-supervised Learningmentioning
confidence: 99%
“…The systems that took part in the SemEval 2020 Challenge -Task 11 represent the most recent approaches to identify propaganda techniques based on given propagandist spans. The most interesting and successful approach (Jurkiewicz et al, 2020) proposes first to extend the training data from a free text corpus as a silver dataset, and second, an ensemble model that exploits both the gold and silver datasets during the training steps to achieve the highest scores. Notice that most of the most performing recent models heavily rely on transformer-based architectures.…”
Section: Related Workmentioning
confidence: 99%
“…A recently popular approach in Named-Entity Recognition tasks has been to use Conditional Random Fields (CRF) with BERT-based models. Inspired by the CRF-based approaches (Souza et al, 2019;Jurkiewicz et al, 2020), we use BERT-based models with a single BiLSTM layer and a CRF layer. During training, the CRF loss is used and during prediction, Viterbi Decoding is performed.…”
Section: Lstm-crfmentioning
confidence: 99%
“…Most teams use multi-granular transformer-based systems for token classification/sequence tagging (Khosla et al, 2020;Morio et al, 2020;Patil et al, 2020). Inspired by Souza et al (2019), Jurkiewicz et al (2020) use RoBERTa-CRF based systems. Li and Xiao (2020) use a variant of SpanBERT span prediction system.…”
Section: Introductionmentioning
confidence: 99%