We propose a new deterministic approach to coreference resolution that combines the global information and precise features of modern machine-learning models with the transparency and modularity of deterministic, rule-based systems. Our sieve architecture applies a battery of deterministic coreference models one at a time from highest to lowest precision, where each model builds on the previous model's cluster output. The two stages of our sieve-based architecture, a mention detection stage that heavily favors recall, followed by coreference sieves that are precision-oriented, offer a powerful way to achieve both high precision and high recall. Further, our approach makes use of global information through an entity-centric model that encourages the sharing of features across all mentions that point to the same real-world entity. Despite its simplicity, our approach gives state-of-the-art performance on several corpora and genres, and has also been incorporated into hybrid state-of-the-art coreference systems for Chinese and Arabic. Our system thus offers a new paradigm for combining knowledge in rule-based systems that has implications throughout computational linguistics.
A definition of metonymy that has gained some popularity in Cognitive Linguistics contrasts metonymical semantic shifs within a domain or domain matrix with metaphorical shifts that cross domain boundaries. In the past few years, however, this definition of metonymy has become subject to more and more criticism, in the sense that it relies too much on the vague notions of domains or domain matrices to be fully reliable. In this article, we address this problem by focusing on a nonunitary, prototypical definition of contiguity (the concept that used to be seen as the defining feature of metonymy before Cognitive Linguistics introduced domains and domain matrices). On the basis of the traditional pre-structuralist literature on metonymy, we identify a large number of typical metonymical patterns, and show that they can be classiJied in terms of the type of contiguity they are motivated by. We argue that metonymies, starting from spatial partwhole contiguity as the core of the category, can be plotted against three dimensions: strength of contact (going from part-whole containment over physical contact to adjacency without contact), boundedness (involving an extension of the part-whole relationship towards unbounded wholes and parts), and domain (with shifts from the spatial to the temporal, the spatiotemporal and the categorial domain).
Link to this article: http://journals.cambridge.org/abstract_S1351324910000161How to cite this article: YVES PEIRSMAN, DIRK GEERAERTS and DIRK SPEELMAN (2010). The automatic identication of lexical variation between language varieties. Natural Language Engineering, 16, pp 469-491 AbstractLanguages are not uniform. Speakers of different language varieties use certain words differently -more or less frequently, or with different meanings. We argue that distributional semantics is the ideal framework for the investigation of such lexical variation. We address two research questions and present our analysis of the lexical variation between Belgian Dutch and Netherlandic Dutch. The first question involves a classic application of distributional models: the automatic retrieval of synonyms. We use corpora of two different language varieties to identify the Netherlandic Dutch synonyms for a set of typically Belgian words. Second, we address the problem of automatically identifying words that are typical of a given lect, either because of their high frequency or because of their divergent meaning. Overall, we show that distributional models are able to identify more lectal markers than traditional keyword methods. Distributional models also have a bias towards a different type of variation. In summary, our results demonstrate how distributional semantics can help research in variational linguistics, with possible future applications in lexicography or terminology extraction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.