Optimizing Word Segmentation for Downstream Tasks by Weighting Text Vector

We propose a novel method to find an appropriate tokenization for a given downstream model by jointly optimizing a tokenizer and the model. The proposed method has no restriction except for using loss values computed by the downstream model to train the tokenizer, and thus, we can apply the proposed method to various NLP task. Moreover, the proposed method can explore the appropriate tokenization to improve the performance for an already trained model as post-processing. Therefore, the proposed method is applicable to various situations. We evaluated whether our method contributes to improving performance on text classification in three languages and machine translation in eight language pairs. Experimental results show that our proposed method improves the performance by determining appropriate tokenizations.

show abstract

Chatting and Accidental Meeting Promoted Study of Optimizing Word Segmentation

Hiraoka

2022

Journal of Natural Language Processing

View full text Add to dashboard Cite

Optimizing Word Segmentation for Downstream Tasks by Weighting Text Vector

Cited by 4 publications

References 33 publications

Verification of textual entailment in Japanese Using Sigmoid and ReLU Function

Verification of textual entailment in Japanese Using Sigmoid and ReLU Function

Joint Optimization of Word Segmentation and Downstream Model using Downstream Loss

Chatting and Accidental Meeting Promoted Study of Optimizing Word Segmentation

Contact Info

Product

Resources

About