Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) 2014
DOI: 10.3115/v1/w14-0605
|View full text |Cite
|
Sign up to set email alerts
|

A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text

Abstract: We present a multilingual evaluation of approaches for spelling normalisation of historical text based on data from five languages: English, German, Hungarian, Icelandic, and Swedish. Three different normalisation methods are evaluated: a simplistic filtering model, a Levenshteinbased approach, and a character-based statistical machine translation approach. The evaluation shows that the machine translation approach often gives the best results, but also that all approaches improve over the baseline and that no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
38
1
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(43 citation statements)
references
References 5 publications
1
38
1
1
Order By: Relevance
“…Table 9 shows results reported by Pettersson et al (2014) on English, German, Hungarian, Icelandic and Swedish, and results by Sánchez-Martínez et al (2013) on Spanish, in addition to our results on Slovene. However, the experimental setups are fairly difficult to compare.…”
Section: Cross-language Comparisonssupporting
confidence: 68%
See 2 more Smart Citations
“…Table 9 shows results reported by Pettersson et al (2014) on English, German, Hungarian, Icelandic and Swedish, and results by Sánchez-Martínez et al (2013) on Spanish, in addition to our results on Slovene. However, the experimental setups are fairly difficult to compare.…”
Section: Cross-language Comparisonssupporting
confidence: 68%
“…It corresponds to our Baseline 2, to some similar experiments but with language-specific thresholds for Pettersson et al (2014), and to a comparable approach based on a spellchecker for Sánchez-Martínez et al (2013). Again, this method shows comparatively low accuracy values for Slovene, but similar CER values.…”
Section: Cross-language Comparisonsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, work on non-standard historical varieties has focused on spelling normalization using rule-based, statistical and neural stringtransduction models (Pettersson et al, 2014;Bollmann and Søgaard, 2016;Tang et al, 2018). Previous studies on lemmatization of historical variants focused on evaluating off-the-shelf systems.…”
Section: Related Workmentioning
confidence: 99%
“…Data-driven approaches, on the other hand, derive their knowledge from large amounts of normalized data. A recent popular data-driven methodology for normalization is the use of character-based statistical machine translation (Schulz et al 2016, Pettersson et al 2014, which partly overcomes the requirement for large amounts of data as the system works at the character level.…”
Section: The Effects Of Normalization On Tagging Accuracymentioning
confidence: 99%