All groups of people felt the impact of the COVID-19 pandemic. This situation triggers anxiety, which is bad for everyone. The government's role is very influential in solving these problems with its work program. It also has many pros and cons that cause public anxiety. For that, it is necessary to detect anxiety to improve government programs that can increase public expectations. This study applies machine learning to detecting anxiety based on social media comments regarding government programs to deal with this pandemic. This concept will adopt a sentiment analysis in detecting anxiety based on positive and negative comments from netizens. The machine learning methods implemented include K-NN, Bernoulli, Decision Tree Classifier, Support Vector Classifier, Random Forest, and XG-boost. The data sample used is the result of crawling YouTube comments. The data used amounted to 4862 comments consisting of negative and positive data with 3211 and 1651. Negative data identify anxiety, while positive data identifies hope (not anxious). Machine learning is processed based on feature extraction of count-vectorization and TF-IDF. The results showed that the sentiment data amounted to 3889 and 973 in testing, and training with the greatest accuracy was the random forest with feature extraction of vectorization count and TF-IDF of 84.99% and 82.63%, respectively. The best precision test is K-NN, while the best recall is XG-Boost. Thus, Random Forest is the best accurate to detect someone's anxiety based-on data from social media.
YouTube is the most widely used in Indonesia, and it’s reaching 88% of internet users in Indonesia. YouTube’s comments in Indonesian languages produced by users has increased massively, and we can use those datasets to elaborate on the polarization of public opinion on government policies. The main challenge in opinion analysis is preprocessing, especially normalize noise like stop words and slang words. This research aims to contrive several preprocessing model for processing the YouTube commentary dataset, then seeing the effect for the accuracy of the sentiment analysis. The types of preprocessing used include Indonesian text processing standards, deleting stop words and subjects or objects, and changing slang according to the Indonesian Dictionary (KBBI). Four preprocessing scenarios are designed to see the impact of each type of preprocessing toward the accuracy of the model. The investigation uses two features, unigram and combination of unigram-bigram. Count-Vectorizer and TF-IDF-Vectorizer are used to extract valuable features. The experimentation shows the use of unigram better than a combination of unigram and bigram features. The transformation of the slang word to standart word raises the accuracy of the model. Removing the stop words also contributes to increasing accuracy. In conclusion, the combination of preprocessing, which consists of standard preprocessing, stop-words removal, converting of Indonesian slang to common word based on Indonesian Dictionary (KBBI), raises accuracy to almost 3.5% on unigram feature.
Sentiment analysis can detect hate speech using the Natural Language Processing (NLP) concept. This process requires annotation of the text in the labeling. However, when carried out by people, this process must use experts in the field of hate speech, so there is no subjectivity. In addition, if processed by humans, it will take a long time and allow errors in the annotation process for extensive data. To solve this problem, we propose an automatic annotation process with the concept of semi-supervised learning using the K-Nearest Neighbor algorithm. This process requires feature extraction of term frequency-inverse document frequency (TF-IDF) to obtain optimal results. KNN and TF-IDF were able to annotate and increase the accuracy of < 2% from the initial iteration of 57.25% to 59.68% in detecting hate speech. This process can annotate the initial dataset of 13169 with the distribution of 80:20 of training and testing data. There are 2370 labeled datasets; for testing, there are 1317 unannotated data; after preprocessing, there are 9482. The final results of the KNN and TF-IDF annotation processes have a length of 11235 for annotated data.
This research aims to analyze marketing strategy of tourism in Yogyakarta using SWOT analysis. This research is using Community Based Tourism (CBT) approach. CBT is the empowerment of local community where they are involved in the planning, managing, and decision making for the development. The urgency of this research is the demand from Indonesian government for tourism development in Special Region of Yogyakarta in order to make Yogyakarta as a leading tourism destination in Southeast Asia. This research is conducted in Yogyakarta that has four districts and one city. The data used in this research is primary and secondary data. The respondents are foreign and domestic tourists and stakeholders officials. The number of respondents is 300 people. The data is collected by convinience sampling. The secondary data analysis shows that foreign and domestic tourists who come to Special Region of Yogyakarta is relatively increasing in the last three years. Tourism destination in Special Region of Yogyakarta which based on community is also increasing and always innovating. The primary data analysis shows thattourism destination quality, satisfaction, and image according to the visitor perception is good, but the visitor loyaly shows a bad result. Most of the tourists visit Special Region of Yogyakarta just once and it is just a transit destination. The strategy that is necessary to increase the visit in tourism destination for tourists can be done through promotion of cultural destination that is quite attractive for the tourists. An integrated promotion for CBT needs to be more improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.