2019
DOI: 10.22219/kinetik.v4i4.912
|View full text |Cite
|
Sign up to set email alerts
|

The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts

Abstract: Preprocessing is an essential task for sentiment analysis since textual information carries a lot of noisy and unstructured data. Both stemming and stopword removal are pretty popular preprocessing techniques for text classification. However, the prior research gives different results concerning the influence of both methods toward accuracy on sentiment classification. Therefore, this paper conducts further investigations about the effect of stemming and stopword removal on Indonesian language sentiment analys… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0
11

Year Published

2020
2020
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 60 publications
(34 citation statements)
references
References 25 publications
0
23
0
11
Order By: Relevance
“…Most of the recent literature presents methods that are based on statistical approaches. The sources mainly use the TF-IDF method to analyze the text, such as [2] , [3] , [4] , [5] , [6] , [7] . In a wide group of languages, the stopwords may be simply exposed by their relatively higher occurrence frequencies.…”
Section: Data Descriptionmentioning
confidence: 99%
“…Most of the recent literature presents methods that are based on statistical approaches. The sources mainly use the TF-IDF method to analyze the text, such as [2] , [3] , [4] , [5] , [6] , [7] . In a wide group of languages, the stopwords may be simply exposed by their relatively higher occurrence frequencies.…”
Section: Data Descriptionmentioning
confidence: 99%
“…Nevertheless, in order to provide some level of protection of the Twitter users whose tweets were included in this study, usernames have been removed from the data made available in the supporting information for this study. The Twitter API limits the retrieval of tweets to tweets from the past week; in order to circumvent this limitation, we used the software GetOldTweets by Jefferson Henrique [88], which has gained some prominence among social media researchers in recent years, as it permits the collection of up to 100% of all available tweets under the search terms entered [85,[89][90][91][92]. As this method of data collection does not include any metadata, no personal data was collected within the context of this study.…”
Section: Data Collectionmentioning
confidence: 99%
“…Hasil dari proses tokenisasi yang berupa kumpulan kata dalam array, diproses lebih lanjut dengan menghilangkan kata umum yang sering muncul (stop word), namun tidak perlu diperhitungkan untuk mendapatkan konteks utama dalam sebuah teks. Contoh yang stop word pada Bahasa Indonesia adalah 'yang', 'di', 'untuk', dan 'dari' [14].…”
Section: Stop Wordunclassified