Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage 2017
DOI: 10.1145/3078081.3078100
|View full text |Cite
|
Sign up to set email alerts
|

A Memory-Based Lemmatizer for Ancient Greek

Abstract: In this paper we present the lemmatizer that we developed for Ancient Greek: GLEM. As far as we know, GLEM is the rst publicly available lemmatizer for Ancient Greek that uses POS information to disambiguate and that also assigns output to unseen words, words that are not yet in the lexicon. As the basis for the lemmatizer we used an existing memorybased learning tool, Frog, that was originally developed for Dutch and that we converted to work for Ancient Greek. As the results of Frog on Ancient Greek were rat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 2 publications
0
9
0
Order By: Relevance
“…These scores suggest that Diorisis outperforms other corpora and lemmatizers on both samples. glem scores surprisingly low compared to the reported accuracy of 93% on an unseen text reported by its developers (Bary et al 2017).…”
Section: 16mentioning
confidence: 70%
See 3 more Smart Citations
“…These scores suggest that Diorisis outperforms other corpora and lemmatizers on both samples. glem scores surprisingly low compared to the reported accuracy of 93% on an unseen text reported by its developers (Bary et al 2017).…”
Section: 16mentioning
confidence: 70%
“…glem chooses the lemma of the most frequent word form/part-of-speech combination according to the proiel lexicon whenever a disambiguation purely based on part of speech is not possible. Bary et al (2017) show that glem outperforms both a lemmatizer based on Frog and the cltk lemmatizer on an unseen text, with an accuracy of 93.0% vs. 75.6% and 76.6%, respectively, making it the state-of-the-art system for Ancient Greek in 2017.…”
Section: State Of the Artmentioning
confidence: 96%
See 2 more Smart Citations
“…As it was in betacode we converted it into unicode (utf8) using a converter created by the Classical Language Toolkit (Johnson and others, 2016). 7 As mentioned before, we combine the manual reports annotation with automated POS-tagging and lemmatization (Bary et al, 2017), which we developed independently and which is open source.…”
Section: Rag: a Greek Corpus Annotated For Reportsmentioning
confidence: 99%