Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

West, Peter; Bhagavatula, Chandra; Hessel, Jack; Hwang, Jena D.; Jiang, Liwei; Bras, Ronan Le; Lu, Ximing; Welleck, Sean; Choi, Yejin

doi:10.18653/v1/2022.naacl-main.341

Cited by 74 publications

(62 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Recent work augmented datasets by fine-tuning a pre-trained LM on real data, then generated new, silver-labelled instances (Anaby-Tavor et al, 2020;Papanikolaou and Pierleoni, 2020;Kumar et al, 2020). Similarly, the few-shot capabilities of GPT-3 (Brown et al, 2020) were leveraged to generate free-text explanations (Wiegreffe et al, 2022), semantically-related sentence pairs (Schick and Schütze, 2021), atomic event commonsense triples (West et al, 2022), and labels for various generation and understanding tasks . In this work, we finetune GPT-3 with minimal human supervision to generate additional contextual data pertaining to events.…”

Section: Lm-generated Data Augmentationmentioning

confidence: 99%

What happens before and after: Multi-Event Commonsense in Event Coreference Resolution

Ravi¹,

Tanner²,

Ng³

et al. 2023

Preprint

View full text Add to dashboard Cite

Event coreference models cluster event mentions pertaining to the same real-world event.Recent models rely on contextualized representations to recognize coreference among lexically or contextually similar mentions. However, models typically fail to leverage commonsense inferences, which is particularly limiting for resolving lexically-divergent mentions. We propose a model that extends event mentions with temporal commonsense inferences. Given a complex sentence with multiple events, e.g., "The man killed his wife and got arrested", with the target event "arrested", our model generates plausible events that happen before the target event -such as "the police arrived", and after it, such as "he was sentenced". We show that incorporating such inferences into an existing event coreference model improves its performance, and we analyze the coreferences in which such temporal knowledge is required.

show abstract

Section: Lm-generated Data Augmentationmentioning

confidence: 99%

What happens before and after: Multi-Event Commonsense in Event Coreference Resolution

Ravi¹,

Tanner²,

Ng³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent approaches have shown a great potential to incorporate external knowledge for knowledgebased VQA. Several methods explore aggregating the external knowledge either in the form of structured knowledge graphs (Garderes et al, 2020;Narasimhan et al, 2018;Li et al, 2020b;Wang et al, 2017a,b), unstructured knowledge bases (Marino et al, 2021;Wu et al, 2022;Luo et al, 2021), and neural-symbolic inference based knowledge (Chen et al, 2020;West et al, 2021). In these methods, object detectors (Ren et al, 2015) and scene classifiers (He et al, 2016) are used to associate images with external knowledge.…”

Section: Related Workmentioning

confidence: 99%

KAT: A Knowledge Augmented Transformer for Vision-and-Language

Gui¹,

Wang²,

Huang³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

The primary focus of recent work with largescale transformers has been on optimizing the amount of information packed into the model's parameters. In this work, we ask a complementary question: Can multimodal transformers leverage explicit knowledge in their reasoning? Existing, primarily unimodal, methods have explored approaches under the paradigm of knowledge retrieval followed by answer prediction, but leave open questions about the quality and relevance of the retrieved knowledge used, and how the reasoning processes over implicit and explicit knowledge should be integrated. To address these challenges, we propose a -Knowledge Augmented Transformer (KAT) -which achieves a strong state-of-theart result (+6% absolute) on the open-domain multimodal task of OK-VQA. Our approach integrates implicit and explicit knowledge in an encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation. Additionally, explicit knowledge integration improves interpretability of model predictions in our analysis. Code and pre-trained models are released at https://github.com/guilk/KAT.

show abstract

“…Commonsense knowledge acquisition is a longstanding challenge in natural language processing (Charniak, 1973;Hwang et al, 2021;Zhang et al, 2021), and current approaches rely on knowledge acquired by pre-trained Transformer language models (Bosselut et al, 2019;Zhang et al, 2020;West et al, 2021). The commonsense reasoning ability of these language models has been evaluated using behavioral probes (Ettinger, 2020;Misra et al, 2021;He et al, 2021) and downstream, fine-tuned evaluations (Banerjee et al, 2021;Zhou et al, 2021;.…”

Section: Related Workmentioning

confidence: 99%

Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

Porada¹,

Sordoni²,

Cheung³

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Transformer models pre-trained with a maskedlanguage-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behavioral probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pretraining corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the pre-training minibatches of BERT and evaluate how well the model generalizes to supported inferences after pre-training on the injected knowledge. We find generalization does not improve over the course of pre-training BERT from scratch, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.

show abstract

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

Cited by 74 publications

References 41 publications

What happens before and after: Multi-Event Commonsense in Event Coreference Resolution

What happens before and after: Multi-Event Commonsense in Event Coreference Resolution

KAT: A Knowledge Augmented Transformer for Vision-and-Language

Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

Contact Info

Product

Resources

About