2020
DOI: 10.1163/15699846-02002001
|View full text |Cite
|
Sign up to set email alerts
|

Lemmatization for Ancient Greek

Abstract: This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 3 publications
0
8
0
Order By: Relevance
“…For the poetic data, lemmatization accuracy is a little lower than the prose data: accuracy ranges from 0.965 (comedies) to 0.975 (epic poetry) for the poetic data, while most prose genres have an accuracy of more than 0.980 (with oratory and biblical texts on the high end): see Table 4. The lemmas are generally consistent with the LSJ lexicon as well as are not entirely comparable, however: our training set is different than the data that the tools used by Vatri and McGillivray (2020) the lemmas included in the Morpheus codebase (which is largely based on LSJ).…”
Section: Part-of-speech and Morphologymentioning
confidence: 96%
“…For the poetic data, lemmatization accuracy is a little lower than the prose data: accuracy ranges from 0.965 (comedies) to 0.975 (epic poetry) for the poetic data, while most prose genres have an accuracy of more than 0.980 (with oratory and biblical texts on the high end): see Table 4. The lemmas are generally consistent with the LSJ lexicon as well as are not entirely comparable, however: our training set is different than the data that the tools used by Vatri and McGillivray (2020) the lemmas included in the Morpheus codebase (which is largely based on LSJ).…”
Section: Part-of-speech and Morphologymentioning
confidence: 96%
“…Lemmatizers have even been developed for ancient inscriptions such as those written in Ancient Greek (Vatri and McGillivray 2020), Early Irish (i.e. Old and Middle Irish) (Dereza 2018), Classical Armenian, Old Georgian and Syriac (Vidal-Gorène and Kindt 2020), Akkadian (Sahala et al 2023), and additionally for palaeographic 11 th century stone inscriptions as well (Ezhilarasi and Maheswari 2021b).…”
Section: Lemmatizationmentioning
confidence: 99%
“…Qi et al, 2020). Vatri and McGillivray (2020) compare lemmatizers for Ancient Greek based on dictionary lookup that exploit PoS information to distinguish ambiguous tokens. Alternatively, some approaches do not rely on this type of information at all (e.g.…”
Section: Related Workmentioning
confidence: 99%