2021
DOI: 10.5715/jnlp.28.479
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Word Segmentation for Downstream Tasks by Weighting Text Vector

Abstract: In traditional NLP, we tokenize a sentence as a preprocessing, and thus the tokenization is unrelated to a downstream task. To address this issue, we propose a novel method to explore an appropriate tokenization for the downstream task. Our proposed method, Optimizing Tokenization (OpTok), is trained to assign a high probability to such appropriate tokenization based on the downstream task loss. Op-Tok can be used for any downstream task which uses a sentence vector representation such as text classification. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
references
References 33 publications
0
0
0
Order By: Relevance