This paper shows the extent to which treebanks of Ancient Greek play a central role in the ongoing Pedalion project at the University of Leuven. Building on diverse treebanks readily available today, the project aims to make progress in the automated parsing of classical and postclassical Greek texts. Rather than developing new technology as such, our project endeavours to make deliberate and methodical use of the technology that already exists, essentially by combining and adapting both technology and data. This contribution offers a 'roadmap' of our project, surveying (a) the existing work on which we can rely, (b) the strategies which we adopt to reach better results in the automated processing of Ancient Greek and (c) the deliverables that have already been realised or are forthcoming.
The Egyptian-Greek contact situation has lasted almost a thousand years and many documents have been preserved to us from this period. In this paper, we apply a new quantitative approach to this rich corpus of documentary papyri to map the relationships between the linguistic variables (the variant spellings) and several non-linguistic variables. A multidimensional scaling of the co-occurrences of the linguistic variables shows that there is a strong association between most of the Greek variant spellings that can be explained by Egyptian phonological transfer, while others do not typically co-occur with them. Several new linguistic variables not yet connected to Egyptian phonological transfer also show a strong relation with the first group of features, some of them representing the same phonological transfer processes. A comparison of the contexts in which these variables are used allows us to further substantiate this observation: several of the previously and newly Egyptian-associated variables turn out to have a strong correlation with bilingual Egyptian-Greek documents or occur in Egyptian dominated environments. The spelling variants are chronologically dependent and different features are typically associated with different historical periods illustrating changes taking place in the Egyptian Greek contact variety over time. A multiple correspondence analysis shows that the variables strongly interact, illustrating the importance of a multifactorial approach combining various linguistic and non-linguistic factors.
This paper explores how to syntactically parse Ancient Greek texts automatically and maps ways of fruitfully employing the results of such an automated analysis. Special attention is given to documentary papyrus texts, a large diachronic corpus of non-literary Greek, which presents a unique set of challenges to tackle. By making use of the Stanford Graph-Based Neural Dependency Parser, we show that through careful curation of the parsing data and several manipulation strategies, it is possible to achieve an Labeled Attachment Score of about 0.85 for this corpus. We also explain how the data can be converted back to its original (Ancient Greek Dependency Treebanks) format. We describe the results of several tests we have carried out to improve parsing results, with special attention paid to the impact of the annotation format on parser achievements. In addition, we offer a detailed qualitative analysis of the remaining errors, including possible ways to solve them. Moreover, the paper gives an overview of the valorisation possibilities of an automatically annotated corpus of Ancient Greek texts in the fields of linguistics, language education and humanities studies in general. The concluding section critically analyses the remaining difficulties and outlines avenues to further improve the parsing quality and the ensuing practical applications.
An evolutionary approach to historical linguistics can be enlightening when not only the mechanisms, but also the statistical methods are considered from neighboring disciplines. In this short paper, we apply survival analysis to investigate what factors determine the lifespan of words. Our case study is on post-classical Greek from the 4th century BC to beginning of the 8th century AD. We find that lower frequency and phonetically longer lexemes suffer earlier deaths. Furthermore, verbs turn out to have higher survival rates than adjectives and nouns survival analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.