In spite of the current availability of large collections of treebanks that can be used and queried from one common place on the web, we are still far from achieving a real interconnection, both between treebanks themselves and with other (kinds of) linguistic resources. However, making resources interoperable is a crucial requirement to maximize the contribution of each single resource, as well as to account for the linguistic complexity of the texts provided by (annotated) corpora and particularly by treebanks. This paper describes how dependency treebanks are interlinked in a Knowledge Base of linguistic resources for Latin based on Linked Open Data practices and standards. The Knowledge base is built to make linguistic resources interact by integrating all types of annotation applied to a particular word/text into a common representation.
In Ancient Greek, as well as in other languages, whenever agreement is triggered by two or more coordinated phrases, two different constructions are allowed: either the agreement can be controlled by the coordinated phrase as a whole, or it can be triggered by just one of the coordinated words. In spite of the amount of information that can be read on this topic in grammars of Ancient Greek, much is still to be known even at a general descriptive level. More importantly, the data still lack a convincing explanation. In this paper, we focus on a special domain of agreement (subject and verb agreement) and on one morphological feature that is expected to covary (number). We discuss the agreement in number for conjoined phrases, by revising some of the modern hypotheses with the support of the empirical evidence that can be collected from the available syntactically annotated corpora of Ancient Greek (treebanks). Results are interpreted according to syntactic features, cognitive factors and semantic properties of the coordinated phrases.
The interoperability between lemmatized corpora of Latin and other resources that use the lemma as indexing key is hampered by the multiple lemmatization strategies that different projects adopt. In this paper we discuss how we tackle the challenges raised by harmonizing different lemmatization criteria in a project that aims to connect linguistic resources for Latin using the Linked Data paradigm. The paper introduces the architecture supporting an open-ended, lemma-based Knowledge Base, built to make textual and lexical resources for Latin interoperable. Particularly, the paper describes the inclusion into the Knowledge Base of its lexical basis, of a word formation lexicon and of a lemmatized and syntactically annotated corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.