The huge impact caused by the COVID-19 pandemic has made many people express their opinions on Twitter social media. There are various responses given by the community that are negative and positive. The dataset comes from kaggle with more than 750 tweets of data. Classification designed by the Naive Bayes method. Implementation through preprocessing, case folding, tokenizing, stopword removal, TF-IDF, and cross validation has been able to produce quite high accuracy. After classification, validation will be carried out with Cross Fold Validation. The best value is on cv5 where accuracy = 0.847, precision = 0.855, recall = 0.83, and f1 score = 0.842.
Hoax news in Indonesia causes various problems, therefore it is necessary to classify whether a news is in the hoax category or is valid. Naive Bayes is an algorithm that can perform classification but has a weakness, namely the selection of attributes that can affect accuracy so that it needs to be optimized by giving weights to attributes using the TF-IDF method. Classification using Naive Bayes and using TF-IDF as attribute weighting on a dataset of 600 data resulted in 82% accuracy, 84% precision, and 89% recall. The suggestion put forward is that it is better to use a larger number of datasets in order to produce higher accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.