Studying word order through iterative shuffling

Malkin, Nikolay; Lanka, Sameera; Goel, Pranav; Jojić, Nebojša

doi:10.48550/arxiv.2109.04867

Cited by 1 publication

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Procedures that force LMs to be more focused on a prompt, or a specific part of it, when generating or ranking tokens can benefit algorithms that search for combinations of words through sampling. It would be interesting to use coherence boosting models in non-autoregressive text generation algorithms, such as to accelerate the mixing of MCMC methods for constrained text generation (e.g., Miao et al (2019); Zhang et al (2020a); Malkin et al (2021)). (Holtzman et al, 2021) is an unconditional probability normalization method, CC (Zhao et al, 2021) is the contextual calibration method and Channel (Min et al, 2021) uses an inverted-LM scoring approach that computes the conditional probability of the input given the label.…”

Section: Discussionmentioning

confidence: 99%

Coherence boosting: When your pretrained language model is not paying enough attention

Malkin¹,

Wang²,

Jojić³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Naturality of long-term information structure -coherence -remains a challenge in language generation. Large language models have insufficiently learned such structure, as their longform generations differ from natural text in measures of coherence. To alleviate this divergence, we propose coherence boosting, an inference procedure that increases the effect of distant context on next-token prediction. We show the benefits of coherence boosting with pretrained models by distributional analyses of generated ordinary text and dialog responses. We also find that coherence boosting with state-of-the-art models for various zeroshot NLP tasks yields performance gains with no additional training.

show abstract

Section: Discussionmentioning

confidence: 99%