We explore the task of automatic assessment of argument quality. To that end, we actively collected 6.3k arguments, more than a factor of five compared to previously examined data. Each argument was explicitly and carefully annotated for its quality. In addition, 14k pairs of arguments were annotated independently, identifying the higher quality argument in each pair. In spite of the inherent subjective nature of the task, both annotation schemes led to surprisingly consistent results. We release the labeled datasets to the community. Furthermore, we suggest neural methods based on a recently released language model, for argument ranking as well as for argument-pair classification. In the former task, our results are comparable to state-of-the-art; in the latter task our results significantly outperform earlier methods. * These authors equally contributed to this work. 1 For more details:https://www.research. ibm.com/artificial-intelligence/ project-debater/live/
Identifying the quality of free-text arguments has become an important task in the rapidly expanding field of computational argumentation. In this work, we explore the challenging task of argument quality ranking. To this end, we created a corpus of 30,497 arguments carefully annotated for point-wise quality, released as part of this work. To the best of our knowledge, this is the largest dataset annotated for point-wise argument quality, larger by a factor of five than previously released datasets. Moreover, we address the core issue of inducing a labeled score from crowd annotations by performing a comprehensive evaluation of different approaches to this problem. In addition, we analyze the quality dimensions that characterize this dataset. Finally, we present a neural method for argument quality ranking, which outperforms several baselines on our own dataset, as well as previous methods published for another dataset.
This paper investigates core semantic properties that distinguish between different types of gradable adjectives and the effect of context on their interpretation. We contend that all gradable adjectives are interpreted relative to a comparison class (van Rooij 2011), and that it is the nature of the comparison class that constitutes the main semantic difference between their subclasses: some adjectives select a class comprised of counterparts of the individual of which the adjective is predicated, while others select an extensional category of this individual. We propose, following Kennedy (2007), that the standard of membership is selected according to a principle of economy whereby an interpretation relative to a maximum or a minimum degree within a comparison class takes precedence over one relative to an arbitrary point. This proposal captures so-called "standard shift" effects, that is, the influence of context on the interpretation of gradable adjectives from all subclasses, whether in their positive form or when modified by degree adverbials. Additionally, this proposal captures cases of apparent lack of context sensitivity (e.g. intuitive inference patterns, unacceptability of for-phrases, etc.). Finally, we hypothesize that the type of comparison class is aligned with the well known distinction between stage-level and individual-level predicates.
No abstract
The field of Grammatical Error Correction (GEC) has produced various systems to deal with focused phenomena or general text editing. We propose an automatic way to combine black-box systems. Our method automatically detects the strength of a system or the combination of several systems per error type, improving precision and recall while optimizing F score directly. We show consistent improvement over the best standalone system in all the configurations tested. This approach also outperforms average ensembling of different RNN models with random initializations.In addition, we analyze the use of BERT for GEC -reporting promising results on this end. We also present a spellchecker created for this task which outperforms standard spellcheckers tested on the task of spellchecking. This paper describes a system submission to Building Educational Applications 2019 Shared Task:Grammatical Error Correction (Bryant et al., 2019).Combining the output of top BEA 2019 shared task systems using our approach, currently holds the highest reported score in the open phase of the BEA 2019 shared task, improving F 0.5 by 3.7 points over the best result reported.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.