2014
DOI: 10.4028/www.scientific.net/amm.543-547.1896
|View full text |Cite
|
Sign up to set email alerts
|

The Research on Tibetan Text Classification Based on N-Gram Model

Abstract: This Compared with the traditional text classification model, the Tibetan text classification based on N-Gram model has adopted N-Gram model in terms of the level of word. In other words, during the text classification, word segmentation is not required. Also, feature selection and abundant pre-treatment processes are avoided. This paper not only carried out profound research on N-Gram models, but also discusses the selection of parameter N in the model by adopting Naïve Bayes Multinomial classifier.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 6 publications
0
0
0
Order By: Relevance
“…Given that different tokenization methods can impact both training efficiency and accuracy of the model, the following analysis provides a brief overview of the effects of these different tokenization methods. [10] [11] involves grouping N characters together and splitting a sentence into segments of N characters each. It is primarily used for calculating the probability of a sentence.…”
Section: Analysis Of Different Tokenizersmentioning
confidence: 99%
“…Given that different tokenization methods can impact both training efficiency and accuracy of the model, the following analysis provides a brief overview of the effects of these different tokenization methods. [10] [11] involves grouping N characters together and splitting a sentence into segments of N characters each. It is primarily used for calculating the probability of a sentence.…”
Section: Analysis Of Different Tokenizersmentioning
confidence: 99%