Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities An 2019
DOI: 10.18653/v1/w19-2509
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting NMT for Normalization of Early English Letters

Abstract: This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus. The corpus has previously been normalized so that only less frequent deviant forms are left out without normalization. This paper discusses different methods for improving the normalization of these deviant forms by using different approaches. Adding features to the training data is found to be unhelpful, but using a lexicographical resource to filter the top candidates produced by the N… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
2

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(17 citation statements)
references
References 4 publications
0
17
0
Order By: Relevance
“…Despite the improvements in the normalization step [15], normalization of the entire CEEC is still a problem that is far from solved. While using synthetic data improves low resourced sequencetosequence models including characterlevel models [13,16], our experiments with backtranslation on the training data available to us have not yielded better accuracies.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Despite the improvements in the normalization step [15], normalization of the entire CEEC is still a problem that is far from solved. While using synthetic data improves low resourced sequencetosequence models including characterlevel models [13,16], our experiments with backtranslation on the training data available to us have not yielded better accuracies.…”
Section: Discussionmentioning
confidence: 99%
“…Their method tries to first detect a neologism and then normalize it into standard lan guage. Interestingly, they apply the idea of normalization in order to remove neologisms, where as we use normalization to find neologisms [15].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…More recently, the work of the CEEC has advanced in several directions, including the Academy of Finland funded DIGIHUM project STRATAS (2016-2019), which grappled with the dimensions of spelling variation in the corpus, which is one of the major issues with digital collections of early language data. The NATAS subproject of STRATAS proposed, among other things, NLP-based solutions to the issue of spelling normalization of the corpus [10]. My thanks to Jack for his contribution to this research and for broadening our horizons on present-day low-resource languages [11].…”
Section: Acknowledgementsmentioning
confidence: 99%
“…Within the historical text normalization, a recent study [11] compared various LSTM architectures, and found that bi-directional recurrent neural networks (BRNN) were more accurate than onedirectional RNNs. Different attention models or deeper architectures did not improve the results further.…”
Section: Related Workmentioning
confidence: 99%