Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023
DOI: 10.18653/v1/2023.emnlp-main.459
|View full text |Cite
|
Sign up to set email alerts
|

Effects of sub-word segmentation on performance of transformer language models

Jue Hou,
Anisia Katinskaia,
Anh-Duc Vu
et al.

Abstract: Language modeling is a fundamental task in natural language processing, which has been thoroughly explored with various architectures and hyperparameters. However, few studies focus on the effect of sub-word segmentation on the performance of language models (LMs). In this paper, we compare GPT and BERT models trained with the statistical segmentation algorithm BPE vs. two unsupervised algorithms for morphological segmentation-Morfessor and StateMorph. We train the models for several languages-including ones w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 22 publications
0
0
0
Order By: Relevance