2022
DOI: 10.1007/s10579-022-09597-1
|View full text |Cite
|
Sign up to set email alerts
|

Constructing a cross-document event coreference corpus for Dutch

Abstract: Event coreference resolution is a task in which different text fragments that refer to the same real-world event are automatically linked together. This task can be performed not only within a single document but also across different documents and can serve as a basis for many useful Natural Language Processing applications. Resources for this type of research, however, are extremely limited. We compiled the first large-scale dataset for cross-document event coreference resolution in Dutch, comparable in size… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(20 citation statements)
references
References 45 publications
0
15
0
Order By: Relevance
“…From these studies, it became clear that outward lexical similarity was the prime indicator of coreference, with features based on (partial) string matching being most effective [7]. Additionally, features modeling document structure had some limited success in withindocument contexts [7,25]. More recently, however, feature-based approaches in coreference resolution have been challenged by transformer-based approaches in which large language models are used to generate strong contextual mention representations, which are then used as a basis for the classification algorithms [26,27].…”
Section: Methodsmentioning
confidence: 99%
“…From these studies, it became clear that outward lexical similarity was the prime indicator of coreference, with features based on (partial) string matching being most effective [7]. Additionally, features modeling document structure had some limited success in withindocument contexts [7,25]. More recently, however, feature-based approaches in coreference resolution have been challenged by transformer-based approaches in which large language models are used to generate strong contextual mention representations, which are then used as a basis for the classification algorithms [26,27].…”
Section: Methodsmentioning
confidence: 99%
“…Interestingly, earlier work on featurebased classifiers for ECR has shown that discourse and meta-linguistic information surrounding an event are in fact important, to some degree, for the classification of coreference (Lu and Ng, 2018). In this paper, we will devise a series of linguistic probes in order to gauge a Dutch transformer-based coreference model's understanding of certain discourse and meta-linguistic event traits that have been shown to be important for within-document ECR (De Langhe et al, 2022c;Lu and Ng, 2018). Currently, it is assumed that this type of information is implicitly encoded into the transformer's contextual embeddings, but with this paper we intend to verify this.…”
Section: De Franse President Macron Ontmoette Dementioning
confidence: 99%
“…Our data consists of the Dutch ENCORE corpus (De Langhe et al, 2022a), which includes 15,407 events spread over 1,015 documents that were sourced from a Dutch newspaper article collection (Vermeulen, 2018). The corpus is comparable in size to most large-scale English-language ECR datasets.…”
Section: Datamentioning
confidence: 99%
See 2 more Smart Citations