2010
DOI: 10.1093/llc/fqq011
|View full text |Cite
|
Sign up to set email alerts
|

Weigh your words--memory-based lemmatization for Middle Dutch

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
28
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 24 publications
(28 citation statements)
references
References 9 publications
0
28
0
Order By: Relevance
“…As even a single scribe often used different spellings for the same word, modern editors already tend to silently normalize minor orthographic variants. We have normalized the orthography in our corpus even further via lemmatization, a useful procedure in stylometry for medieval texts (Kestemont et al, 2010). The texts were first tokenized using the Natural Language Toolkit (Bird et al, 2009).…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…As even a single scribe often used different spellings for the same word, modern editors already tend to silently normalize minor orthographic variants. We have normalized the orthography in our corpus even further via lemmatization, a useful procedure in stylometry for medieval texts (Kestemont et al, 2010). The texts were first tokenized using the Natural Language Toolkit (Bird et al, 2009).…”
mentioning
confidence: 99%
“…To generate these variants, we constructed an array with all possible variations for the consecutive character groups. Next, we combined these options through the Cartesian product in the matrix by means of a permutation algorithm (Kestemont et al, 2010). Table 1 lists the series of common alternative character combinations we have considered, loosely based on Riggs (1996).…”
mentioning
confidence: 99%
“…One may wonder how the annotation projection approach performs in comparison to direct applications of modern language NLP tools to normalized historical data language (Scheible et al, 2011). While it is unlikely that such an approach could scale beyond closely related varieties, successful experiments on the annotation of normalized historical language have been reported, although mostly focused on token-level annotations (POS, lemma, morphology) of language stages which syntax does not greatly deviate from modern rules (Rayson et al, 2007;Pennacchiotti and Zanzotto, 2008;Kestemont et al, 2010;Bollmann, 2013). For the annotation of more remotely related varieties with more drastic differences in word order rigidity or morphology as considered here, however, projection techniques are more promising as they have been successfully applied to unrelated languages, as well, but still benefit from diachronic proximity, cf.…”
Section: Discussionmentioning
confidence: 99%
“…This approach has been followed by Kestemont et al (2010) and Barteld et al (2016) for lemmatization of historical texts.…”
Section: Related Workmentioning
confidence: 99%