BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief

Kassner, Nora; Tafjord, Oyvind; Schütze, Hinrich; Clark, Peter

doi:10.48550/arxiv.2109.14723

Cited by 3 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[7] evaluate and improve factual consistency of pre-trained LMs across paraphrasings of factual statements. [15] consider the responses of a pre-trained LM to a stream of questions, and evaluate and improve the consistency and accuracy of its answers over time. [16] collect counterfactual instances to evaluate the overreliance of NLP models on spurious attributes.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Maharana¹,

Kamath²,

Clark³

et al. 2023

Preprint

View full text Add to dashboard Cite

As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support. Inconsistent AI models are considered brittle and untrustworthy by human users and are more challenging to incorporate into larger systems that take dependencies on their outputs. Measuring consistency between very heterogeneous tasks that might include outputs in different modalities is challenging since it is difficult to determine if the predictions are consistent with one another. As a solution, we introduce a benchmark dataset, COCOCON, where we use contrast sets created by modifying test instances for multiple tasks in small but semantically meaningful ways to change the gold label, and outline metrics for measuring if a model is consistent by ranking the original and perturbed instances across tasks. We find that state-of-the-art systems suffer from a surprisingly high degree of inconsistent behavior across tasks, especially for more heterogeneous tasks. Finally, we propose using a rank correlation-based auxiliary objective computed over large automatically created cross-task contrast sets to improve the multi-task consistency of large unified models, while retaining their original accuracy on downstream tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

“…In computer vision, cross-task consistency has been of some interest for classical tasks [26], while in natural language processing past work has studied consistency between tasks like question-answering [15]. However, in vision-and-language research, much work has focused on…”

Section: Introductionmentioning

confidence: 99%

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Maharana¹,

Kamath²,

Clark³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Selected Tasks GLUECons contains tasks ranging over five different types of problems categorized based on the type of available knowledge. This includes 1) Classification with label dependencies: Mutual exclusivity in multiclass classification using MNIST (LeCun et al 1998) and Hierarchical image classification using CIFAR 100 (Krizhevsky and Hinton 2009), 2) Self-Consistency in decisions: What-If Question Answering (Tandon et al 2019), Natural Language Inference (Bowman et al 2015), BeliefBank (Kassner et al 2021), 3) Consistency with external knowledge: Entity and Relation Extraction using CONLL2003 (Sang and De Meulder 2003), 4) Structural Consistency: BIO Tagging, 5) Constraints in (un/semi)supervised setting: MNIST Arithmetic and Sudoku. These tasks either use existing datasets or are extensions of existing tasks, reformulated so that the usage of knowledge is applicable to them.…”

Section: Knowledge Integration Solutionmentioning

confidence: 99%

“…We use off-the-shelf ILP tools that perform an efficient search and offer a natural way to integrate constraints. However, constraints should be converted to a linear form to be able to exploit these tools Kordjamshidi, Roth, and Wu 2015;Kordjamshidi et al 2016).…”

Section: Constraint Integration In Prior Researchmentioning

confidence: 99%

GLUECons: A Generic Benchmark for Learning under Constraints

Faghihi

Nafar

Chen

et al. 2023

AAAI

View full text Add to dashboard Cite

Recent research has shown that integrating domain knowledge into deep learning architectures is effective; It helps reduce the amount of required data, improves the accuracy of the models' decisions, and improves the interpretability of models. However, the research community lacks a convened benchmark for systematically evaluating knowledge integration methods. In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision. In all cases, we model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints. We report the results of these models using a new set of extended evaluation criteria in addition to the task performances for a more in-depth analysis. This effort provides a framework for a more comprehensive and systematic comparison of constraint integration techniques and for identifying related research challenges. It will facilitate further research for alleviating some problems of state-of-the-art neural models.

show abstract

“…Similar to the efforts done for overcoming lack of consistency under different paraphrases, Hase et al (2021) add another loss term to their objective function to minimize the error across entailed data. On the other hand, Kassner et al (2021) use a feedback mechanism that issues relevant information from a symbolic memory of beliefs as input to the system during test-time in order to improve consistency under entailment.…”

Section: Consistencymentioning

confidence: 99%

A Review on Language Models as Knowledge Bases

Badr¹,

Li²,

Çelikyılmaz³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs trained on a sufficiently large (web) corpus will encode a significant amount of knowledge implicitly in its parameters. The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major advantage over traditional KBs in that this method requires no human supervision. In this paper, we present a set of aspects that we deem an LM should have to fully act as a KB, and review the recent literature with respect to those aspects. 1

show abstract

BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief

Cited by 3 publications

References 16 publications

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

GLUECons: A Generic Benchmark for Learning under Constraints

A Review on Language Models as Knowledge Bases

Contact Info

Product

Resources

About