Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for cross-linguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.
This paper presents the Coptic Universal Dependency Treebank, the first dependency treebank within the Egyptian subfamily of the Afro-Asiatic languages. We discuss the composition of the corpus, challenges in adapting the UD annotation scheme to existing conventions for annotating Coptic, and evaluate inter-annotator agreement on UD annotation for the language. Some specific constructions are taken as a starting point for discussing several more general UD annotation guidelines, in particular for appositions, ambiguous passivization, incorporation and object-doubling.
Humans use natural language, vision, and context to resolve referents in their environment. While some situated reference resolution is trivial, ambiguous cases arise when the language is underspecified or there are multiple candidate referents. This study investigates how pragmatic modulators external to the linguistic content are critical for the correct interpretation of referents in these scenarios. In particular, we demonstrate in a human subjects experiment how the social norms applicable in the given context influence the interpretation of referring expressions. Additionally, we highlight how current coreference tools in natural language processing fail to handle these ambiguous cases. We also briefly discuss the implications of this work for assistive robots which will routinely need to resolve referents in their environment.
The current investigation addresses a vital lacuna in forensic author-ship studies, and more concretely, in Native Language Influence Detection (NLID)research: narrowing down a speaker’s native dialect instead of only their nativelanguage (L1), which might not be enough when carrying out sociolinguistic pro-filing tasks. Native Dialect Influence Detection (NDID), the focus of our study,can thus greatly aid at the investigative level. We approach this topic by pro-viding a comprehensive analysis of linguistic features that serve to identify twonon-contact dialects of L1 Spanish (i.e., Mexican and Peninsular varieties) whendealing with data written in L2 English, which come from Tripadvisor. Our mainaim is to investigate if an author’s L2 features can point to their L1 native di-alect, rather than only to their native language. Findings point to L1 dialectaltransfer of punctuation signs, adjectives of affect, and intensifiers: these linguisticfeatures, even when expressed in an L2, show a culturally bound use. Addition-ally, we implemented an automatic classifier that achieved an accuracy of 69% incategorizing test data, using only linguistic features that have explanatory powerand can aid linguistic theory. This is key for explainability in the forensic con-text, which Native Language Identification (NLI) studies tend to neglect (Kingston2019). Results show that L1 Spanish dialects can be differentiated by analyzing L2English text, pointing to NDID as a fertile approach for narrowing down candidateL1 dialects of a language when analyzing L2 data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.