2021
DOI: 10.1007/978-3-030-86159-9_21
|View full text |Cite
|
Sign up to set email alerts
|

Handling Heavily Abbreviated Manuscripts: HTR Engines vs Text Normalisation Approaches

Abstract: Although abbreviations are fairly common in handwritten sources, particularly in medieval and modern Western manuscripts, previous research dealing with computational approaches to their expansion is scarce. Yet abbreviations present particular challenges to computational approaches such as handwritten text recognition and natural language processing tasks. Often, pre-processing ultimately aims to lead from a digitised image of the source to a normalised text, which includes expansion of the abbreviations. We … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 14 publications
0
9
0
Order By: Relevance
“…• a deep generative approach to text line analysis, inspired by deep unsupervised multi-object segmentation approaches and adapted to work in both a weakly supervised and unsupervised setting, • a demonstration of the potential of our approach in challenging applications, particularly ciphered text analysis and paleographic analysis, • an extended evaluation on three very different datasets: a printed volume of the Google1000 dataset [19,46], the Copiale cipher [2,27] and historical handwritten charters from the 12th and early 13th century [6,44].…”
Section: Contributionmentioning
confidence: 99%
See 3 more Smart Citations
“…• a deep generative approach to text line analysis, inspired by deep unsupervised multi-object segmentation approaches and adapted to work in both a weakly supervised and unsupervised setting, • a demonstration of the potential of our approach in challenging applications, particularly ciphered text analysis and paleographic analysis, • an extended evaluation on three very different datasets: a printed volume of the Google1000 dataset [19,46], the Copiale cipher [2,27] and historical handwritten charters from the 12th and early 13th century [6,44].…”
Section: Contributionmentioning
confidence: 99%
“…We experiment with three datasets with different characteristics: Google1000 [46], the Copiale cipher [27] and Fontenay manuscripts [6,44]. We check that our method leads to transcription results on par with related baselines using the Character Error Rate metric.…”
Section: Datasets and Metricmentioning
confidence: 99%
See 2 more Smart Citations
“…We have notably realized transcriptions that restore the spaces in Arabic, even when there are visually no discernable spaces in the manuscript. In view of the great variety of character morphologies in the Maghrebi manuscripts scripts, we favored the word-based approach instead of the character-based approach where the word separation is managed in post-processing [5]. where the copist should have written ṭā' marbūṭa in final position…”
Section: Specifications For Transcriptionmentioning
confidence: 99%