Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1153
|View full text |Cite
|
Sign up to set email alerts
|

Improving Lemmatization of Non-Standard Languages with Joint Learning

Abstract: Lemmatization of standard languages is concerned with (i) abstracting over morphological differences and (ii) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword. In the present paper we aim to improve lemmatization performance on a set of non-standard historical languages in which the difficulty is increased by an additional aspect (iii): spelling variation due to lacking orthographic standards. We approach lemmatization as a stringtransduction task with an encod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(21 citation statements)
references
References 26 publications
1
20
0
Order By: Relevance
“…The literature concerning historical text normalisation is considerably larger, and includes approaches based on substitution lists, rules, as well as distance-based, statistical approaches (in particular, character-based neural machine translation) and more recently neural models [4]. To include a modelling of context, normalisation systems can reuse deep-learning architectures originally intended for neural machine translation [4,9] or lemmatisation [6,15].…”
Section: Computational Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…The literature concerning historical text normalisation is considerably larger, and includes approaches based on substitution lists, rules, as well as distance-based, statistical approaches (in particular, character-based neural machine translation) and more recently neural models [4]. To include a modelling of context, normalisation systems can reuse deep-learning architectures originally intended for neural machine translation [4,9] or lemmatisation [6,15].…”
Section: Computational Approachesmentioning
confidence: 99%
“…Results are shown in Table 3. Word normalisation For abbreviation expansion, the neural tagger Pie was trained [15], following a setup already used for Old French [6]. It was trained on an aligned version of the previous D-abb and D-exp (with word separation and line hyphenations normalised) alone, and with the addition of the Oriflamms data.…”
Section: Text Normalisation Approachmentioning
confidence: 99%
“…For the Vulgate, we follow the traditional segmentation into verses, which amounts to 36,663 documents. Both collections were lemmatized using the neural Latin lemmatizer provided by the software library pie (Manjavacas et al, 2019a).…”
Section: Datasetmentioning
confidence: 99%
“…We use Pie [Manjavacas et al 2019] as our main training and tagging software . For training, Pie is used for lemma-and POS-tagging.…”
Section: Set Upmentioning
confidence: 99%