2021
DOI: 10.48550/arxiv.2109.11406
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Named Entity Recognition and Classification on Historical Documents: A Survey

Abstract: After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficiently search, retrieve and explore information from this 'big data of the past'. Among semantic indexing opportunities,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 115 publications
(204 reference statements)
0
2
0
Order By: Relevance
“…Historical and ancient texts pose many challenges for entity extraction, such as spelling variation, language change, and lack of large corpora. The main techniques to extract entities from historical texts include temporal entity extraction, event extraction, and named entity recognition (NER) [25,26]. In many cases, entity extraction is performed manually [27].…”
Section: Entity Extractionmentioning
confidence: 99%
“…Historical and ancient texts pose many challenges for entity extraction, such as spelling variation, language change, and lack of large corpora. The main techniques to extract entities from historical texts include temporal entity extraction, event extraction, and named entity recognition (NER) [25,26]. In many cases, entity extraction is performed manually [27].…”
Section: Entity Extractionmentioning
confidence: 99%
“…Therefore, we will review research on historical NLP applications and extractive supervised text summarization methods. For a detailed review, readers may refer to [18], [19] and [20], [21] surveys, respectively.…”
Section: Related Workmentioning
confidence: 99%
“…Existing NLP studies on historical documents primarily focus on tasks such as spelling normalization [18], [23], machine translation [24], and sequence labelling, including part-of-speech tagging [25] and named entity recognition [19], [26]. Recently, the success of deep neural networks has introduced new applications in this domain, including sentiment analysis [27], information retrieval [28], event extraction [29], [30], and text classification [31].…”
Section: Historical Natural Language Processing Applicationsmentioning
confidence: 99%
“…Yet, the recognition, classification and disambiguation of NEs in historical texts are not straightforward, and performances are not on par with what is usually observed on contemporary well-edited English news material [3]. In particular, NE processing on historical documents faces the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources [6]. Although some of these issues have already been tackled in isolation in other contexts (with e.g., user-generated text), what makes the task particularly difficult is their simultaneous combination and their magnitude: texts are severely noisy, and domains and time periods are far apart.…”
Section: Introductionmentioning
confidence: 99%