2020
DOI: 10.1007/978-3-030-64452-9_19
|View full text |Cite
|
Sign up to set email alerts
|

Entity Linking for Historical Documents: Challenges and Solutions

Abstract: Named entities (NEs) are among the most relevant type of information that can be used to efficiently index and retrieve digital documents. Furthermore, the use of Entity Linking (EL) to disambiguate and relate NEs to knowledge bases, provides supplementary information which can be useful to differentiate ambiguous elements such as geographical locations and peoples' names. In historical documents, the detection and disambiguation of NEs is a challenge. Most historical documents are converted into plain text us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 24 publications
(63 reference statements)
0
5
0
Order By: Relevance
“…In many cases, entity extraction is performed manually [27]. In other cases, automatic techniques for entity extraction are used [28][29][30]. The main techniques for NER include rule-based approaches, machine-learning-based approaches, and deeplearning approaches [25].…”
Section: Entity Extractionmentioning
confidence: 99%
“…In many cases, entity extraction is performed manually [27]. In other cases, automatic techniques for entity extraction are used [28][29][30]. The main techniques for NER include rule-based approaches, machine-learning-based approaches, and deeplearning approaches [25].…”
Section: Entity Extractionmentioning
confidence: 99%
“…Other works studied the development of features and rules to improve specific-domain NEL (Heino et al 2017) or entity types (Brando et al 2016). Moreover, some studies focused on the effect of problems frequently encountered in historical documents on NEL (Linhares Pontes et al 2020a). They represented the entities in a continuous space and combined them with a neural attention mechanism to analyze context words and candidate entity embeddings to disambiguate mentions in historical documents.…”
Section: Ocr Errors and Named-entity Linkingmentioning
confidence: 99%
“…The most basic form involves performing string matching. However, Levenshtein distance and other fuzzy matching techniques may be more useful as they can still provide helpful information if there is no exact match [33,34]. More recently, similarity measures have been applied to contextualised embeddings rather than the surface form of the mention and entity [35,36,6].…”
Section: Candidate Generationmentioning
confidence: 99%