State-of-the-art models in NLP are now predominantly based on deep neural networks that are opaque in terms of how they come to make predictions. This limitation has increased interest in designing more interpretable deep models for NLP that reveal the 'reasoning' behind model outputs. But work in this direction has been conducted on different datasets and tasks with correspondingly unique aims and metrics; this makes it difficult to track progress. We propose the Evaluating Rationales And Simple English Reasoning (ERASER ) benchmark to advance research on interpretable models in NLP. This benchmark comprises multiple datasets and tasks for which human annotations of "rationales" (supporting evidence) have been collected. We propose several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are (i.e., the degree to which provided rationales influenced the corresponding predictions). Our hope is that releasing this benchmark facilitates progress on designing more interpretable NLP systems. The benchmark, code, and documentation are available at https://www.eraserbenchmark.com/ Commonsense Explanations (CoS-E)Where do you find the most amount of leafs? (a) Compost pile (b) Flowers (c) Forest (d) Field (e) Ground Movie ReviewsIn this movie, … Plots to take over the world. The acting is great! The soundtrack is run-of-the-mill, but the action more than makes up for it (a) Positive (b) Negative Evidence InferenceArticle Patients for this trial were recruited … Compared with 0.9% saline, 120 mg of inhaled nebulized furosemide had no effect on breathlessness during exercise. (a) Sig. decreased (b) No sig. difference (c) Sig. increased Prompt With respect to breathlessness, what is the reported difference between patients receiving placebo and those receiving furosemide? e-SNLI H A man in an orange vest leans over a pickup truck P A man is touching a truck (a) Entailment (b) Contradiction (c) Neutral
No abstract
We present a (randomized) test for monotonicity of Boolean functions. Namely, given the ability to query an unknown function f : {0, 1} n → {0, 1} at arguments of its choice, the test always accepts a monotone f , and rejects f with high probability if it is -far from being monotone (i.e., every monotone function differs from f on more than an fraction of the domain). The complexity of the test is O(n/ ).The analysis of our algorithm relates two natural combinatorial quantities that can be measured with respect to a Boolean function; one being global to the function and the other being local to it. A key ingredient is the use of a switching (or sorting) operator on functions.
The field of property testing studies algorithms that distinguish, using a small number of queries, between inputs which satisfy a given property, and those that are 'far' from satisfying the property. Testing properties that are defined in terms of monotonicity has been extensively investigated, primarily in the context of the monotonicity of a sequence of integers, or the monotonicity of a function over the. These works resulted in monotonicity testers whose query complexity is at most polylogarithmic in the size of the domain.We show that in its most general setting, testing that Boolean functions are close to monotone is equivalent, with respect to the number of required queries, to several other testing problems in logic and graph theory. These problems include: testing that a Boolean assignment of variables is close to an assignment that satisfies a specific -CNF formula, testing that a set of vertices is close to one that is a vertex cover of a specific graph, and testing that a set of vertices is close to a clique.We then investigate the query complexity of monotonicity testing of both Boolean and integer functions over general partial orders. We give algorithms and lower bounds for the general problem, as well as for some interesting special cases. In proving a general lower bound, we construct graphs with combinatorial properties that may be of independent interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.