Text Generation with Text-Editing Models

Malmi, Eric; Yue, Dong; Mallinson, Jonathan; Chuklin, Aleksandr; Adámek, Jakub; Mirylenka, Daniil; Stahlberg, Felix; Krause, Sebastian M.; Kumar, Shankar; Severyn, Aliaksei

doi:10.18653/v1/2022.naacl-tutorials.1

Cited by 5 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For directly predicting the target corrections from corresponding input tokens, Omelianchuk et al (2020) and Malmi et al (2022) regarded the encoder of the transformer model as a nonautoregressive GEC sequence tagger. The experimental results of Omelianchuk et al (2020) showed that, compared with the randomly initialized LSTM (Hochreiter and Schmidhuber 1997), the pre-trained models, such as RoBERTa (Liu et al 2019), GPT-2 (Radford et al 2019), and ALBERT (Lan et al 2020), can achieve higher F 0.5 scores as a tagger.…”

Section: Related Workmentioning

confidence: 99%

Bidirectional Transformer Reranker for Grammatical Error Correction

Zhang,

Kamigaito,

Okumura

2024

Journal of Natural Language Processing

View full text Add to dashboard Cite

Pre-trained sequence-to-sequence (seq2seq) models have achieved state-of-the-art results in the grammatical error correction tasks. However, these models are plagued by prediction bias owing to their unidirectional decoding. Thus, this study proposed a bidirectional transformer reranker (BTR) that re-estimates the probability of each candidate sentence generated by the pre-trained seq2seq model. The BTR preserves the seq2seq-style transformer architecture but utilizes a BERT-style self-attention mechanism in the decoder to compute the probability of each target token using masked language modeling to capture bidirectional representations from the target context. To guide the reranking process, the BTR adopted negative sampling in the objective function to minimize the unlikelihood. During inference, the BTR yielded the final results after comparing the reranked top-1 results with the original ones using an acceptance threshold λ. Experimental results showed that, when reranking candidates from a pre-trained seq2seq model, the T5-base, the BTR on top of T5base yielded scores of 65.47 and 71.27 F 0.5 on the CoNLL-14 and building educational applications 2019 (BEA) test sets, respectively, and yielded 59.52 GLEU score on the JFLEG corpus, with improvements of 0.36, 0.76, and 0.48 points compared with the original T5-base. Furthermore, when reranking candidates from T5-large, the BTR on top of T5-base improved the original T5-large by 0.26 on the BEA test set.

show abstract

Section: Related Workmentioning

confidence: 99%

Bidirectional Transformer Reranker for Grammatical Error Correction

Zhang,

Kamigaito,

Okumura

2024

Journal of Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Recently, a line of work has emerged examining how large language models (LLMs) can serve as collaborative writing/coding assistants. Because of their remarkable ability to generate coherent texts over a wide range of domains and topics, LLMs have proven surprisingly effective for editing, elaboration, infilling, etc., across a wide range of domains (Malmi et al, 2022;Bavarian et al, 2022;Donahue et al, 2020). Though our system also makes use of LLMs, it supports a different mode of editing than these prior works.…”

Section: Background and Related Workmentioning

confidence: 99%

Toward Interactive Dictation

Li¹,

Eisner²,

Pauls³

et al. 2023

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Voice dictation is an increasingly important text input modality. Existing systems that allow both dictation and editing-by-voice restrict their command language to flat templates invoked by trigger words. In this work, we study the feasibility of allowing users to interrupt their dictation with spoken editing commands in open-ended natural language. We introduce a new task and dataset, TERTiUS, to experiment with such systems. To support this flexibility in real-time, a system must incrementally segment and classify spans of speech as either dictation or command, and interpret the spans that are commands. We experiment with using large pre-trained language models to predict the edited text, or alternatively, to predict a small text-editing program. Experiments show a natural trade-off between model accuracy and latency: a smaller model achieves 28% singlecommand interpretation accuracy with 1.3 seconds of latency, while a larger model achieves 55% with 7 seconds of latency. * Work performed during a research internship at Microsoft Semantic Machines.

show abstract

“…This paper studies a multiagent collaborative framework where one language model can generate critiques to improve its peer's performance. 2020; Malmi et al, 2022), grammatical (Lichtarge et al, 2019) or factual error correction (Mitchell et al, 2022b), debiasing and detoxification (Schick et al, 2021). Unlike humans who can understand natural language feedback and improve using the information, most of the previous work relied on sequence tagging (Reid and Neubig, 2022), retraining from scratch (Sun et al, 2019) or parameter editing (Mitchell et al, 2022a) to repair model predictions.…”

Section: Introductionmentioning

confidence: 99%

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Akyürek¹,

Ekin²,

Kalyan³

et al. 2023

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Despite their unprecedented success, even the largest language models make mistakes. Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their outputs. Because human-generated critiques are expensive to obtain, researchers have devised learned critique generators in lieu of human critics while assuming one can train downstream models to utilize generated feedback. However, this approach does not apply to black-box or limited access models such as ChatGPT, as they cannot be fine-tuned. Moreover, in the era of large general-purpose language agents, fine-tuning is neither computationally nor spatially efficient as it results in multiple copies of the network. In this work, we introduce RL4F (Reinforcement Learning for Feedback), a multi-agent collaborative framework where the critique generator is trained to maximize end-task performance of GPT-3, a fixed model more than 200 times its size. RL4F produces critiques that help GPT-3 revise its outputs. We study three datasets for action planning, summarization and alphabetization and show relative improvements up to 10% in multiple text similarity metrics over other learned, retrievalaugmented or prompting-based critique generators. 1

show abstract

Text Generation with Text-Editing Models

Cited by 5 publications

References 22 publications

Bidirectional Transformer Reranker for Grammatical Error Correction

Bidirectional Transformer Reranker for Grammatical Error Correction

Toward Interactive Dictation

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Contact Info

Product

Resources

About