2024
DOI: 10.3390/app14135696
|View full text |Cite
|
Sign up to set email alerts
|

A Comprehensive Analysis of Various Tokenizers for Arabic Large Language Models

Faisal Qarah,
Tawfeeq Alsanoosy

Abstract: Pretrained language models have achieved great success in various natural language understanding (NLU) tasks due to their capacity to capture deep contextualized information in text using pretraining on large-scale corpora. Tokenization plays a significant role in the process of lexical analysis. Tokens become the input for other natural language processing (NLP) tasks, like semantic parsing and language modeling. However, there is a lack of research on the evaluation of the impact of tokenization on the Arabi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 39 publications
0
0
0
Order By: Relevance