This study aims to analyze the sentiment of using the Mypertamina application in purchasing subsidized fuel oil using the Naive Bayes algorithm. This research involves data pre-processing stages, such as full preprocessing and stopword removal, as well as accuracy testing by varying the distribution of training data and test data. The results showed that by carrying out full preprocessing of the data and using 70% of the training data, the classification model achieved an accuracy of 85%. The use of 80% training data increases accuracy to 87%, while the use of 90% training data results in an accuracy of 89%. This shows that the more training data used, the better the performance of the classification model. Eliminating stopwords also has a significant impact on model accuracy. Without omission of stopwords, the accuracy of the model with a data division of 70%, 80%, and 90% is 80%, 82%, and 84%, respectively. Even though the accuracy is lower than full preprocessing, the model still provides good predictions. Based on the test results, it can be concluded that the application of full preprocessing with more training data tends to produce better model performance. However, removing stopwords also makes a significant contribution to improving accuracy. Therefore, in developing a text classification model, comprehensive pre-processing and appropriate stopword removal need to be considered according to the characteristics of the data and analysis needs. In testing the classification using the Naïve Bayes Classifier method, the distribution of training data and test data also has an effect. The use of 70% training data results in an accuracy of 85%, while the use of 80% and 90% training data results in an accuracy of 87% and 89% respectively. The more training data used, the better the performance of the Naïve Bayes Classifier classification model. In the final conclusion, the proportion of 90% of the training data gives the best performance in classifying the test data with the highest accuracy. However, using a smaller test dataset may lead to a higher variation in results. Therefore, cross-validation methods or tests with more folds can provide more comprehensive information about the performance of the classification model.