BPE-Dropout: Simple and Effective Subword Regularization

Provilkov, Ivan; Emelianenko, Dmitrii; Voita, Elena

doi:10.18653/v1/2020.acl-main.170

Cited by 167 publications

(189 citation statements)

References 21 publications

Supporting

Mentioning

138

Contrasting

Unclassified

Order By: Relevance

“…In other words, we replaced the unigram language model in OpTok with the Sentence-Piece tokenizer and used one tokenized sentence as an input to the same architecture. Moreover, many studies have reported that training models with a stochastic tokenization lead to a better performance of the downstream tasks than training a model using deterministic tokenization (Kudo, 2018;Hiraoka et al, 2019;Provilkov et al, 2019). Thus, we trained the encoder and downstream model using subword regularization provided by SentencePiece.…”

Section: Experimental Settingsmentioning

confidence: 99%

“…Thus, as shown in Figure 1(a), we apply an existing tokenizer to the given sentence, and then input the tokenized sentence into a model for a target downstream task. In the conventional approach, we obtain the most plausible tokenized sentence based on the tokenizer; however, some studies have varied the tokenization using a sampling during the training to enable the downstream model to adapt to various tokenizations (Kudo, 2018;Hiraoka et al, 2019;Provilkov Figure 1: Overview of (a) conventional tokenization and (b) optimizing tokenization proposed herein. We directly optimize the tokenizer to improve the performance of the model for a downstream task using the loss of the target task.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimizing Word Segmentation for Downstream Task

Hiraoka¹,

Takase²,

Uchiumi³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

In traditional NLP, we tokenize a given sentence as a preprocessing, and thus the tokenization is unrelated to a target downstream task. To address this issue, we propose a novel method to explore a tokenization which is appropriate for the downstream task. Our proposed method, optimizing tokenization (Op-Tok), is trained to assign a high probability to such appropriate tokenization based on the downstream task loss. OpTok can be used for any downstream task which uses a vector representation of a sentence such as text classification. Experimental results demonstrate that OpTok improves the performance of sentiment analysis and textual entailment. In addition, we introduce OpTok into BERT, the state-ofthe-art contextualized embeddings and report a positive effect.

show abstract

Section: Experimental Settingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Optimizing Word Segmentation for Downstream Task

Hiraoka¹,

Takase²,

Uchiumi³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

show abstract

“…Kudo (2018) introduced the training method of subword regularization. Most recently, the BPEdropout (Provilkov et al, 2019) was introduced which modifies the original BPE's encoding process to enable stochastic segmentation. Our work shares the motivation of exposing diverse subword candidates to the NMT models with previous works but differs in that our method uses gradient signals.…”

Section: Related Workmentioning

confidence: 99%

“…In this regard, Kudo (2018) proposed subword regularization, a training method that exposes multiple segmentations using a unigram language model. Starting from machine translation, it has been shown that subword regularization can improve the robustness of NLP models in various tasks (Kim, 2019;Provilkov et al, 2019;Drexler and Glass, 2019;Müller et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Adversarial Subword Regularization for Robust Neural Machine Translation

Park¹,

Sung²,

Lee³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Exposing diverse subword segmentations to neural machine translation (NMT) models often improves the robustness of machine translation as NMT models can experience various subword candidates. However, the diversification of subword segmentations mostly relies on the pre-trained subword language models from which erroneous segmentations of unseen words are less likely to be sampled. In this paper, we present adversarial subword regularization (ADVSR) to study whether gradient signals during training can be a substitute criterion for exposing diverse subword segmentations. We experimentally show that our model-based adversarial samples effectively encourage NMT models to be less sensitive to segmentation errors and improve the performance of NMT models in low-resource and out-domain datasets.

show abstract

“…We use the implementation in YouTokenToMe 6 library. It is fast and offers BPE-dropout (Provilkov et al, 2019) regularization technique.…”

Section: Text Encoding Considerationsmentioning

confidence: 99%

Proceedings of the 17th International Conference on Spoken Language Translation

Federico¹,

Waibel²,

Knight³

et al. 2020

View full text Add to dashboard Cite

The conference chairs and organizers would like to express their gratitude to everyone who contributed and supported IWSLT. Our IWSLT-20 program exceeds all our expectations in quality and breath, particularly when considering the challenges during a pandemic under lock-downs and health and travel restrictions. We thank the challenge track chairs, organizers, and participants, the program chairs and committee members, as well as all the authors that went the extra mile to submit system and research papers to IWSLT, and make this year's conference our most vibrant than ever. We also wish to express our sincere gratitude to ACL for hosting our conference and for arranging the logistics and infrastructure that allow us to hold IWSLT 2020 as a virtual online conference.

show abstract

BPE-Dropout: Simple and Effective Subword Regularization

Cited by 167 publications

References 21 publications

Optimizing Word Segmentation for Downstream Task

Optimizing Word Segmentation for Downstream Task

Adversarial Subword Regularization for Robust Neural Machine Translation

Proceedings of the 17th International Conference on Spoken Language Translation

Contact Info

Product

Resources

About