Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

Lee, Jason; Mansimov, Elman; Cho, Kyunghyun

doi:10.18653/v1/d18-1149

Cited by 332 publications

(413 citation statements)

References 29 publications

Supporting

Mentioning

384

Contrasting

Order By: Relevance

“…The main difficulty is converting the insert operations into in-place edits at each x i . Other parallel models (Ribeiro et al, 2018;Lee et al, 2018) have used methods like predicting insertion slots in a pre-processing step, or predicting zero or more tokens in-between any two tokens in x. We will see in Section 3.1.4 that these options do not perform well.…”

Section: The Seq2edits Functionmentioning

confidence: 99%

“…We achieve the effect of fertility using delete and append edits. Lee et al (2018) generate target sequences iteratively but require the target sequence length to be predicted at start. In contrast our in-place edit model allows target sequence length to change with appends.…”

Section: Spell Correctionmentioning

confidence: 99%

“…However, matching the accuracy of existing ED models without the luxury of conditional generation is highly challenging. Recently, parallel models have also been explored in tasks like translation (Stern et al, 2018;Lee et al, 2018;Gu et al, 2018;Kaiser et al, 2018) and speech synthesis (van den Oord et al, 2018), but their accuracy is significantly lower than corresponding ED models. The PIE model incorporates the following four ideas to achieve comparable accuracy on tasks like GEC in spite of parallel decoding.…”

Section: Introductionmentioning

confidence: 99%

“…Since input and output lengths are different in general such formulation is non-trivial, particularly due to edits that insert words. We create special compounds edits that merge token inserts with preceding edits that yield higher accuracy than earlier methods of independently predicting inserts (Ribeiro et al, 2018;Lee et al, 2018). 3.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Parallel Iterative Edit Models for Local Sequence Transduction

Awasthi¹,

Sarawagi²,

Goyal³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

112

View full text Add to dashboard Cite

We present a Parallel Iterative Edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC). Recent approaches are based on the popular encoder-decoder (ED) model for sequence to sequence learning. The ED model auto-regressively captures full dependency among output tokens but is slow due to sequential decoding. The PIE model does parallel decoding, giving up the advantage of modelling full dependency in the output, yet it achieves accuracy competitive with the ED model for four reasons: 1. predicting edits instead of tokens, 2. labeling sequences instead of generating sequences, 3. iteratively refining predictions to capture dependencies, and 4. factorizing logits over edits and their token argument to harness pretrained language models like BERT. Experiments on tasks spanning GEC, OCR correction and spell correction demonstrate that the PIE model is an accurate and significantly faster alternative for local sequence transduction. The code and pre-trained models for GEC are available at https://github. com/awasthiabhijeet/PIE.

show abstract

Section: The Seq2edits Functionmentioning

confidence: 99%

Section: Spell Correctionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Parallel Iterative Edit Models for Local Sequence Transduction

Awasthi¹,

Sarawagi²,

Goyal³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

112

View full text Add to dashboard Cite

show abstract

“…The lost of autoregressive dependency largely hurt the consistency of the output sentences, increase the difficulty in the learning process and thus lead to a low quality translation. Previous works mainly focus on adding different components into the NART model to improve the expressiveness of the network structure to overcome the loss of autoregressive dependency (Gu et al, 2017;Lee et al, 2018;Kaiser et al, 2018). However, the computational overhead of new components will hurt the inference speed, contradicting with the goal of the NART models: to parallelize and speed up neural machine translation models.…”

Section: Introductionmentioning

confidence: 99%

Hint-Based Training for Non-Autoregressive Machine Translation

Li¹,

Lin²,

He³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency. Non-AutoRegressive Translation (NART) models were proposed to reduce the inference time, but could only achieve inferior translation accuracy. In this paper, we proposed a novel approach to leveraging the hints from hidden states and word alignments to help the training of NART models. The results achieve significant improvement over previous NART models for the WMT14 En-De and De-En datasets and are even comparable to a strong LSTMbased ART baseline but one order of magnitude faster in inference. Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.

show abstract

Non‐Autoregressive Translation Algorithm Based on LLM Knowledge Distillation in English Corpus

Ju,

Wang

2024

Engineering Reports

View full text Add to dashboard Cite

Although significant advancements have been made in the quality of machine translation by large‐scale language models, their high computational costs and resource consumption have hindered their widespread adoption in practical applications. So this research introduces an English corpus‐based machine translation algorithm that leverages knowledge distillation from large language model, with the goal of enhancing translation quality and reducing the computational demands of the model. Initially, we conducted a thorough analysis of the English corpus to identify prevalent language patterns and structures. Following this, we developed a knowledge distillation approach that transfers the translation expertise of a large teacher model to a smaller student model, thereby achieving increased translation accuracy and efficiency. We designed a dynamic temperature hyperparameter distillation strategy that effectively enhances the precision of translations. In the experimental phase, we utilized several standard English corpora to train and assess our algorithm. The findings indicate that, compared to current machine translation systems, our method significantly reduces the need for computational resources while preserving translation quality.

show abstract

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

Cited by 332 publications

References 29 publications

Parallel Iterative Edit Models for Local Sequence Transduction

Parallel Iterative Edit Models for Local Sequence Transduction

Hint-Based Training for Non-Autoregressive Machine Translation

Non‐Autoregressive Translation Algorithm Based on LLM Knowledge Distillation in English Corpus

Contact Info

Product

Resources

About