2021
DOI: 10.12928/telkomnika.v19i4.20369
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing text classification performance by preprocessing misspelled words in Indonesian language

Abstract: Supervised learning using shallow machine learning methods is still a popular method in processing text, despite the rapidly advancing sector of unsupervised methodologies using deep learning. Supervised text classification for application user feedback sentiments in Indonesian Language is one of the applications which is quite popular in both the research community and industry. However, due to the nature of shallow machine learning approaches, various text preprocessing techniques are required to clean the i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(14 citation statements)
references
References 26 publications
0
10
0
Order By: Relevance
“…Fuzzy matching (FM) can be used to measure the inequality between two strings, where text matches can be found, even with misspelled and different words [3]. Measurement of text similarity can also apply Euclidean distance similarity (EDS) [7]. Principal component analysis (PCA) is a conversion technique that allows to reduce the size of a data set that includes a large number of interrelated dimensionalities, so that the current data can be expressed with a smaller number of variables [8].…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…Fuzzy matching (FM) can be used to measure the inequality between two strings, where text matches can be found, even with misspelled and different words [3]. Measurement of text similarity can also apply Euclidean distance similarity (EDS) [7]. Principal component analysis (PCA) is a conversion technique that allows to reduce the size of a data set that includes a large number of interrelated dimensionalities, so that the current data can be expressed with a smaller number of variables [8].…”
Section: Introductionmentioning
confidence: 99%
“…The tokenizing technique helps to reduce the dimensions of the problem in a group of texts. By cutting a group of sentences into chunks of words makes the analysis process into a simpler form [1], [7], [9]- [11]. Capitalization variability in this dataset can cause problems during classification and degrade performance.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations