Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1031
|View full text |Cite
|
Sign up to set email alerts
|

Learning attention for historical text normalization by learning to pronounce

Abstract: Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-theart by an absolute 2% increase in performance. We analyze t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
48
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1

Relationship

4
5

Authors

Journals

citations
Cited by 28 publications
(49 citation statements)
references
References 14 publications
1
48
0
Order By: Relevance
“…We expected that the bias toward monotonic alignments would help the hard attention model at smaller data sizes, but it is the soft attention model that seems to do better there, while the hard attention model does better in most cases at the larger data sizes. Note that Bollmann et al (2017) trained their model on individual manuscripts, with no training set containing more than 13.2k tokens. The fact that this model struggles with larger data sizes, especially for seen tokens, suggests that the default hyperparameters may be tuned to work well with small training sets at the cost of underfitting the larger datasets.…”
Section: Results: Normalization Accuracymentioning
confidence: 99%
See 1 more Smart Citation
“…We expected that the bias toward monotonic alignments would help the hard attention model at smaller data sizes, but it is the soft attention model that seems to do better there, while the hard attention model does better in most cases at the larger data sizes. Note that Bollmann et al (2017) trained their model on individual manuscripts, with no training set containing more than 13.2k tokens. The fact that this model struggles with larger data sizes, especially for seen tokens, suggests that the default hyperparameters may be tuned to work well with small training sets at the cost of underfitting the larger datasets.…”
Section: Results: Normalization Accuracymentioning
confidence: 99%
“…It is therefore critical to report both dataset statistics and experimental results for unseen tokens. Unfortunately, some recent papers have only reported accuracy on all tokens, and only in comparison to other (non-baseline) systems (Bollmann and Søgaard, 2016;Bollmann et al, 2017;Korchagina, 2017). These figures can be misleading if systems underperform the naïve baseline on seen tokens (which we show does happen in practice).…”
Section: Task Setting and Issues Of Evaluationmentioning
confidence: 99%
“…Multi-task learning (MTL) and semi-supervised learning are both successful paradigms for learning in scenarios with limited labelled data and have in recent years been applied to almost all areas of NLP. Applications of MTL in NLP, for example, include partial parsing ), text normalisation (Bollman et al, 2017), neural machine translation (Luong et al, 2016), and keyphrase boundary classification (Augenstein and .…”
Section: Introductionmentioning
confidence: 99%
“…Model We use the same encoder-decoder architecture with attention as described in Bollmann et al (2017). 4 This is a fairly standard model consisting of one bidirectional LSTM unit in the encoder and one (unidirectional) LSTM unit in the decoder.…”
Section: Methodsmentioning
confidence: 99%