2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
DOI: 10.1109/icdar.2019.00022
|View full text |Cite
|
Sign up to set email alerts
|

Decipherment of Historical Manuscript Images

Abstract: European libraries and archives are filled with enciphered manuscripts from the early modern period. These include military and diplomatic correspondence, records of secret societies, private letters, and so on. Although they are enciphered with classical cryptographic algorithms, their contents are unavailable to working historians. We therefore attack the problem of automatically converting cipher manuscript images into plaintext. We develop unsupervised models for character segmentation, character-image clu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 16 publications
(15 citation statements)
references
References 11 publications
0
15
0
Order By: Relevance
“…Since transcription and deciphering are usually separated subsequent tasks, errors in transcription are propagated, affecting heavily the decryption. Therefore, we investigate the integration of image processing and automatic decryption into one single step (as was proposed for the first time by Kevin Knight and presented in a pilot study (Yin et al 2019)) or into an iterated pipeline (feedback). This joint architecture will hopefully fasten the time-consuming transcription, minimize the errors and create synergy effects as both image processing and automatic decryption tools rely on statistical language models and clustering of symbols, which could be commonly used.…”
Section: Discussionmentioning
confidence: 99%
“…Since transcription and deciphering are usually separated subsequent tasks, errors in transcription are propagated, affecting heavily the decryption. Therefore, we investigate the integration of image processing and automatic decryption into one single step (as was proposed for the first time by Kevin Knight and presented in a pilot study (Yin et al 2019)) or into an iterated pipeline (feedback). This joint architecture will hopefully fasten the time-consuming transcription, minimize the errors and create synergy effects as both image processing and automatic decryption tools rely on statistical language models and clustering of symbols, which could be commonly used.…”
Section: Discussionmentioning
confidence: 99%
“…This noise can come from the natural degradation of historical documents, human mistakes during a manual transcription process, or misspelled words by the author, as in the Zodiac-408 cipher. Noise can also come from automatically transcribing historical ciphers using Optical Character Recognition (OCR) techniques (Yin et al, 2019). It is thus crucial to have a robust decipherment model that can still crack ciphers despite the noise.…”
Section: Transcription Noisementioning
confidence: 99%
“…Hauer et al (2014) test their proposed method on noisy ciphers created by randomly corrupting log 2 (N ) of the ciphertext characters. However, automatic transcription of historical documents is very challenging and can introduce more types of noise, including the addition and deletion of some characters during character segmentation (Yin et al, 2019). We test our model on three types of random noise: insertion, deletion, and substitution.…”
Section: Transcription Noisementioning
confidence: 99%
“…In our data, it is not known which signs are truly related to one another, thus we refrain from giving the model explicit information about compositionality. Yin et al (2019) segment and transcribe undeciphered scripts based on visual similarities between glyphs. Although their transcription error rate is high, they still achieve partial decipherments with no human intervention.…”
Section: Below)mentioning
confidence: 99%