Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.240
|View full text |Cite
|
Sign up to set email alerts
|

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

Abstract: Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…Pragmatic Coherence To include wider sentence context, this paradigm focuses on the incorporation of coherence in terms of capturing the transition of meaning in longer contexts. Typically, such models are trained by predicting the correct subsequent input (i.e., word sequences or sentences), inspired by the concept of predictive coding [15,4]. In this paradigm, we employ SkipThoughts [23], GPT-2 [36] as well as GPT-3 [7] 4 , as these approaches are based on next word or sentence prediction training objectives.…”
Section: Sentence Embedding Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Pragmatic Coherence To include wider sentence context, this paradigm focuses on the incorporation of coherence in terms of capturing the transition of meaning in longer contexts. Typically, such models are trained by predicting the correct subsequent input (i.e., word sequences or sentences), inspired by the concept of predictive coding [15,4]. In this paradigm, we employ SkipThoughts [23], GPT-2 [36] as well as GPT-3 [7] 4 , as these approaches are based on next word or sentence prediction training objectives.…”
Section: Sentence Embedding Modelsmentioning
confidence: 99%
“…Typically, such models are trained by predicting the correct subsequent input (i.e., word sequences or sentences), inspired by the concept of predictive coding [15,4]. In this paradigm, we employ SkipThoughts [23], GPT-2 [36] as well as GPT-3 [7] 4 , as these approaches are based on next word or sentence prediction training objectives. Here, we include both GPT-2 and GPT-3 in order to examine the possible effect of the extended input length used during the pre-training procedure in GPT-3 (4096 tokens, compared to 1024 tokens for GPT-2) on the resulting neural fits.…”
Section: Sentence Embedding Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Figure 4: Our rehearsal and anticipation (r/a) decoder receives a masked input E r/a and the more recent memory updated M t+1 to predict the masked tokens and whether the segment belongs to the past or future. (Zhang et al, 2021) and anticipation as the prediction of the future (Oord et al, 2018;Araujo et al, 2021). To use the same machinery, we pose these processes as masked modeling tasks that predict past and future coreference-related tokens.…”
Section: Masked Modelingmentioning
confidence: 99%
“…These BERTtype models produce sentence representations using a special token [CLS]. More recently, some models (Lee et al, 2020;Iter et al, 2020;Araujo et al, 2021b) have been proposed to improve discourse-level representations by incorporating additional components or mechanisms into the vanilla BERT. Furthermore, due to the success of deep learning sentence encoders, some Spanish models were released.…”
Section: Sentence Encodersmentioning
confidence: 99%