Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021
DOI: 10.1145/3404835.3463255
|View full text |Cite
|
Sign up to set email alerts
|

A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

Abstract: Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic hi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 27 publications
(18 citation statements)
references
References 16 publications
0
18
0
Order By: Relevance
“…Tables 5 -8 show the performance of our ℎ𝑚𝐵𝐸𝑅𝑇 32𝑘 models compared to the current state-of-the-art. For German, even the ℎ𝑚𝐵𝐸𝑅𝑇 32𝑘 base model could not reach the performance reported by Hamdi et al [20]. The performance difference is 1.64%.…”
Section: Downstream Task Evaluationmentioning
confidence: 55%
See 3 more Smart Citations
“…Tables 5 -8 show the performance of our ℎ𝑚𝐵𝐸𝑅𝑇 32𝑘 models compared to the current state-of-the-art. For German, even the ℎ𝑚𝐵𝐸𝑅𝑇 32𝑘 base model could not reach the performance reported by Hamdi et al [20]. The performance difference is 1.64%.…”
Section: Downstream Task Evaluationmentioning
confidence: 55%
“…We evaluate the ℎ𝑚𝐵𝐸𝑅𝑇 32𝑘 models on the NewsEye NER dataset [7], because this dataset includes most of the languages that hmBert covers (except English), and compare them with current state-of-the-art reported by Hamdi et al [20]. We use the Flair [21] library and perform a hyperparameter search (see Table 18 in appendix).…”
Section: Downstream Task Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…However, most approaches to entity linking and toponym resolution are optimized to perform well with clean texts originally intended for a global audience and they do not generalize well to noisy, historical, or regional texts (Ehrmann, Romanello, Flückiger, & Clematide, 2020;Gritta, Pilehvar, Limsopatham, & Collier, 2018;Wang & Hu, 2019). Some entity linking datasets have been created to address this issue, such as Ehrmann et al (2020) and Hamdi et al (2021), both built from digitized historical newspaper collections.…”
Section: Overviewmentioning
confidence: 99%