Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019) 2019
DOI: 10.18653/v1/w19-7812
|View full text |Cite
|
Sign up to set email alerts
|

Creating, Enriching and Valorizing Treebanks of Ancient Greek

Abstract: This paper shows the extent to which treebanks of Ancient Greek play a central role in the ongoing Pedalion project at the University of Leuven. Building on diverse treebanks readily available today, the project aims to make progress in the automated parsing of classical and postclassical Greek texts. Rather than developing new technology as such, our project endeavours to make deliberate and methodical use of the technology that already exists, essentially by combining and adapting both technology and data. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 10 publications
0
5
0
Order By: Relevance
“…Such a pre-processing pipeline consists of three parts: a tokenizer to split the words and punctuation, a part-of-speech tagger to perform morphological analysis and a lemmatiser to provide every word with its lemma. Such pipelines already exist for ancient Greek (Crane 1991, Keersmaekers et al 2019) but those are dictionary-based approaches that cannot deal with inconsistent orthography or out-ofvocabulary words. For example, a dictionary-based approach will be able to analyse βιβλίου as a genitive from βιβλίον ("book"), but the variant βοιβληου, as it occurs in Occurrence 32232, will not be recognised as such.…”
Section: Iii1 Nlp Subprojectmentioning
confidence: 99%
“…Such a pre-processing pipeline consists of three parts: a tokenizer to split the words and punctuation, a part-of-speech tagger to perform morphological analysis and a lemmatiser to provide every word with its lemma. Such pipelines already exist for ancient Greek (Crane 1991, Keersmaekers et al 2019) but those are dictionary-based approaches that cannot deal with inconsistent orthography or out-ofvocabulary words. For example, a dictionary-based approach will be able to analyse βιβλίου as a genitive from βιβλίον ("book"), but the variant βοιβληου, as it occurs in Occurrence 32232, will not be recognised as such.…”
Section: Iii1 Nlp Subprojectmentioning
confidence: 99%
“…ongoing dependency treebank initiatives for Ancient Greek will be briefly described here (see also Keersmaekers et al 2019). The two most extensive projects that exist to date are the Perseus Ancient Greek (and Latin) Dependency Treebanks (AGDT) (Bamman, Mambrini, and Crane 2009) and the PROIEL Treebank (Haug and Jøhndal 2008).…”
Section: State Of the Field And Related Workmentioning
confidence: 99%
“…It is also worthwhile to mention the Harrington Trees, containing -among other texts -Lucian's True Histories (Harrington 2018). Finally, for this project, we developed our own collection of syntactic trees, the Pedalion Treebanks (Keersmaekers et al 2019), including classical and post-classical prose and poetry, with a special focus on genres and authors that are less well represented in the major treebanking projects -see Sections 4.1 and 6 for more details. This outline shows that, until now, there has been a strong emphasis on setting up treebank projects and establishing annotation conventions.…”
Section: State Of the Field And Related Workmentioning
confidence: 99%
“…While the projects mentioned above only include the full text, there have also been some efforts to add linguistic annotation. A wide variety of treebanking projects have manually annotated Greek texts for morphology, lemmas, (dependency) syntax and sometimes semantics, most prominently the PROIEL project (Haug and Jøhndal, 2008; 277,000 tokens), the Ancient Greek Dependency Treebanks (AGDT; Bamman et al, 2009; 560,000 tokens), the Gorman trees (Gorman, 2020; 324,000 tokens) and the Pedalion project (Keersmaekers et al, 2019; 320,000 tokens), as well as some smaller projects (in total, the manually annotated work includes about 1.5 million tokens). The former two projects are also included in the Universal Dependencies (UD) project (Nivre et al, 2016).…”
Section: Related Workmentioning
confidence: 99%