Word meaning changes over time, depending on linguistic and extra-linguistic factors. Associating a word's correct meaning in its historical context is a central challenge in diachronic research, and is relevant to a range of NLP tasks, including information retrieval and semantic search in historical texts. Bayesian models for semantic change have emerged as a powerful tool to address this challenge, providing explicit and interpretable representations of semantic change phenomena. However, while corpora typically come with rich metadata, existing models are limited by their inability to exploit contextual information (such as text genre) beyond the document timestamp. This is particularly critical in the case of ancient languages, where lack of data and long diachronic span make it harder to draw a clear distinction between polysemy (the fact that a word has several senses) and semantic change (the process of acquiring, losing, or changing senses), and current systems perform poorly on these languages. We develop GASC, a dynamic semantic change model that leverages categorical metadata about the texts' genre to boost inference and uncover the evolution of meanings in Ancient Greek corpora. In a new evaluation framework, our model achieves improved predictive performance compared to the state of the art.
The Diorisis Ancient Greek Corpus is a digital collection of ancient Greek texts (from Homer to the early fifth century ad) compiled for linguistic analyses, and specifically with the purpose of developing a computational model of semantic change in Ancient Greek. The corpus consists of 820 texts sourced from open access digital libraries. The texts have been automatically enriched with morphological information for each word. The automatic assignment of words to the correct dictionary entry (lemmatization) has been disambiguated with the implementation of a part-of-speech tagger (a computer programme that may select the part of speech to which an ambiguous word belongs).
Language is a complex and dynamic system. If we consider word meaning, which is the scope of lexical semantics, we observe that some words have several meanings, thus displaying lexical polysemy. In this article, we present the first phase of a project that aims at computationally modelling Ancient Greek semantics over time. Our system is based on Bayesian learning and on the Diorisis Ancient Greek corpus, which we have built for this purpose. We illustrate preliminary results in light of expert annotation, and take this opportunity to discuss the role of computational systems and human analysis in a complex research area like historical semantics. On the one hand, computational approaches allow us to model large corpora of texts. On the other hand, a long and rich scholarly tradition in Ancient Greek has provided us with valuable insights into the mechanisms of semantic change (cf. e.g. Leiwo, M. (2012). Introduction: variation with multiple faces. In Leiwo, M., Halla-aho, H., and Vierros, M. (eds), Variation and Change in Greek and Latin, Helsinki: Suomen Ateenan-instituutin säätiö, pp. 1–11.). In this article, we show that these qualitative analyses can be leveraged to support and complement the computational modelling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.