Categorization is a central capability of human cognition, and a number of theories have been developed to account for properties of categorization. Despite the fact that many semantic tasks involve categorization, theories of categorization do not play a major role in contemporary research in computational linguistics. This paper follows the idea that embedding-based models of semantics lend themselves well to being formulated in terms of classical categorization theories. The benefit is a group of models that enables (a) the formulation of hypotheses about the impact of major design decisions, and (b) a transparent assessment of these decisions. We instantiate this idea on the frame-semantic frame identification task. We define four models that cross two design variables: (a) the choice of prototype vs. exemplar categorization, corresponding to different degrees of generalization applied to the input, and (b) the presence vs. absence of a fine-tuning step, corresponding to generic vs. task-adaptive categorization. We find that for frame identification, generalization and task-adaptive categorization both yield substantial benefits. Our prototype-based, fine-tuned model, which combines the best choices over these variables, establishes a new state-of-the-art in frame identification.
Characterizing paraphrases formally has proven to be a challenging task. Hasegawa et al. (2011) pointed out the usefulness of FrameNet for paraphrase research, focusing on paraphrases which are backed by underlying classical linguistic relationships such as synonymy or voice alternations. This article proposes that other frame-to-frame-relations, notablyUsing, can serve as a source for concept-based paraphrases – that is, paraphrases that are backed by common sense knowledge, as in he called him a hero – he praised him for being a hero. While the predicates in these sentences are not synonymous, we would argue that the sentences are paraphrases – albeit of a kind that involves world knowledge about the relationship between different event classes. In this article, we propose a shallow taxonomy for the frame pairs which instantiateUsing, that is motivated by their ability to form concept-based paraphrases. Second, we analyze the subclass ofUsinginstances which supports concept-based paraphrasing, and provide a formalization of some prominent types of side conditions that are necessary to produce felicitous paraphrases.
Lexical resources such as WordNet (Miller, 1995) and FrameNet (Baker et al., 1998) are organized as graphs, where relationships between words are made explicit via the structure of the resource. This work explores how structural information from these lexical resources can lead to gains in a downstream task, namely frame identification. While much of the current work in frame identification uses various neural architectures to predict frames, those neural architectures only use representations of frames based on annotated corpus data. We demonstrate how incorporating knowledge directly from the FrameNet graph structure improves the performance of a neural network-based frame identification system. Specifically, we construct a bidirectional LSTM with a loss function that incorporates various graph-and corpus-based frame embeddings for learning and ultimately achieves strong performance gains with the graphbased embeddings over corpus-based embeddings alone.
Extending semantic role labeling (SRL) to detect and recover non-local arguments continues to be a challenge. Our work is the first to address the detection of implicit roles from a multilingual perspective. We map predicate-argument structures across English and German sentences, and we develop a classifier that distinguishes implicit arguments from other translation shifts. Using a combination of alignment statistics and linguistic features, we achieve a precision of 0.68 despite a limited training set, which is a significant gain over the majority baseline. Our approach does not rely on pre-existing knowledge bases and is extendible to any language pair with parallel data and dependency parses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.