Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.584
|View full text |Cite
|
Sign up to set email alerts
|

New Benchmark Corpus and Models for Fine-grained Event Classification: To BERT or not to BERT?

Abstract: We introduce a new set of benchmark datasets derived from ACLED data for fine-grained event classification and compare the performance of various state-of-the-art machine learning models on these datasets, including SVM based on TF-IDF character n-grams and neural context-free embeddings (GLOVE and FASTTEXT) as well as deep learning-based BERT with its contextual embeddings. The best results in terms of micro (94.3-94.9%) and macro F 1 (86.0-88.9%) were obtained using BERT transformer, with simpler TF-IDF char… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 21 publications
(19 citation statements)
references
References 17 publications
0
18
0
1
Order By: Relevance
“…Although most of the reported work in this area focuses on processing English texts, and in particular, news-like texts as presented in Piskorski et al (2020), some efforts on event classification for non-English language were reported too. For instance, Sahoo et al (2020) introduced a benchmark corpus for fine-grained classification of natural and man-made disasters (28 types) for Hindi, accompanied with evaluation of deep learning baseline models for this task.…”
Section: Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Although most of the reported work in this area focuses on processing English texts, and in particular, news-like texts as presented in Piskorski et al (2020), some efforts on event classification for non-English language were reported too. For instance, Sahoo et al (2020) introduced a benchmark corpus for fine-grained classification of natural and man-made disasters (28 types) for Hindi, accompanied with evaluation of deep learning baseline models for this task.…”
Section: Prior Workmentioning
confidence: 99%
“…For the training purposes the participants were allowed to either exploit any freely available existing event-annotated textual corpora and/or to exploit the short text snippets reporting events which are part of the large event database created by ACLED and which can be obtained from ACLED data portal 4 for research and academic purposes. Furthermore, the participants were also recommended to exploit as an inspiration the techniques for text normalization and cleaning of ACLED data, and some baseline classification models trained using ACLED data described in Piskorski et al (2020).…”
Section: Training Datamentioning
confidence: 99%
“…Piskorkski refers to the bert-base model fine-tuned by Piskorski et al (2020), which was the top performer on test fold 1 in their experiments. Test scores for the Piskorski model trained on 10% of the training data are estimated from the graphs in Piskorski et al (2020). metrics reported on test fold 1.…”
Section: Modelsmentioning
confidence: 99%
“…What all of the systems have in common is that they need a representation of text that is understandable for a computer. Piskorski et al (2020) showed that modern transformer embeddings are the best choice by comparing them to classic word embeddings and achieving superior results with them. Based on these findings, we decided to make use of them in our work too.…”
Section: Related Workmentioning
confidence: 99%