Background Since the first COVID-19 vaccine appeared, there has been a growing tendency to automatically determine public attitudes toward it. In particular, it was important to find the reasons for vaccine hesitancy, since it was directly correlated with pandemic protraction. Natural language processing (NLP) and public health researchers have turned to social media (eg, Twitter, Reddit, and Facebook) for user-created content from which they can gauge public opinion on vaccination. To automatically process such content, they use a number of NLP techniques, most notably topic modeling. Topic modeling enables the automatic uncovering and grouping of hidden topics in the text. When applied to content that expresses a negative sentiment toward vaccination, it can give direct insight into the reasons for vaccine hesitancy. Objective This study applies NLP methods to classify vaccination-related tweets by sentiment polarity and uncover the reasons for vaccine hesitancy among the negative tweets in the Serbian language. Methods To study the attitudes and beliefs behind vaccine hesitancy, we collected 2 batches of tweets that mention some aspects of COVID-19 vaccination. The first batch of 8817 tweets was manually annotated as either relevant or irrelevant regarding the COVID-19 vaccination sentiment, and then the relevant tweets were annotated as positive, negative, or neutral. We used the annotated tweets to train a sequential bidirectional encoder representations from transformers (BERT)-based classifier for 2 tweet classification tasks to augment this initial data set. The first classifier distinguished between relevant and irrelevant tweets. The second classifier used the relevant tweets and classified them as negative, positive, or neutral. This sequential classifier was used to annotate the second batch of tweets. The combined data sets resulted in 3286 tweets with a negative sentiment: 1770 (53.9%) from the manually annotated data set and 1516 (46.1%) as a result of automatic classification. Topic modeling methods (latent Dirichlet allocation [LDA] and nonnegative matrix factorization [NMF]) were applied using the 3286 preprocessed tweets to detect the reasons for vaccine hesitancy. Results The relevance classifier achieved an F-score of 0.91 and 0.96 for relevant and irrelevant tweets, respectively. The sentiment polarity classifier achieved an F-score of 0.87, 0.85, and 0.85 for negative, neutral, and positive sentiments, respectively. By summarizing the topics obtained in both models, we extracted 5 main groups of reasons for vaccine hesitancy: concern over vaccine side effects, concern over vaccine effectiveness, concern over insufficiently tested vaccines, mistrust of authorities, and conspiracy theories. Conclusions This paper presents a combination of NLP methods applied to find the reasons for vaccine hesitancy in Serbia. Given these reasons, it is now possible to better understand the concerns of people regarding the vaccination process.
BACKGROUND After the first COVID-19 vaccine appeared, there has been a growing tendency to determine public attitudes toward it automatically. In particular, it has been important to find the reasons for vaccine hesitancy, since it was directly correlated with pandemic protraction. Natural language processing (NLP) and public health researchers have turned to social media (Twitter, Reddit, and Facebook) for user-created content from which they could gauge public opinion on vaccination. To automatically process such content, they use a number of NLP techniques, most notably topic modeling. Topic modeling enables the automatic uncovering and grouping of hidden topics in the text. When applied to content that expresses negative sentiment toward vaccination, it can give a direct insight into reasons for vaccine hesitancy. OBJECTIVE This study applies NLP methods to classify vaccination-related tweets by sentiment polarity, and uncover reasons for vaccine hesitancy among the negative tweets in the Serbian language. METHODS To study the attitudes and beliefs behind vaccine hesitancy, we collected two batches of tweets that mention some aspects of the COVID-19 vaccination. 8,817 tweets were manually annotated as either relevant or irrelevant regarding the COVID-19 vaccination sentiment and then the relevant were annotated as positive, negative or neutral. We used the annotated tweets to train a sequential BERT-based classifier for two tweet classification tasks to augment this initial dataset. The first classifier distinguishes between relevant and irrelevant tweets. The second classifier used the relevant tweets and classified them as negative, positive or neutral. This sequential classifier was used to annotate the second batch of tweets. The combined datasets resulted in 3,286 tweets with a negative sentiment: 1,770 from the manually annotated dataset and 1,516 as a result of automatic classification. Topic modeling methods (LDA and NMF) were applied using 3,286 preprocessed tweets to detect reasons for vaccine hesitancy. RESULTS The relevance classifier achieves an F-score of 0.91 and 0.96 for relevant and irrelevant tweets, respectively. The sentiment polarity classifier achieves an F-score of 0.87, 0.85 and 0.85 for negative neutral and positive sentiment, respectively. By summarizing the topics obtained in both models, we extracted five main groups of reasons for vaccine hesitancy: Concern over vaccine side effects, Concern over vaccine effectiveness, Concern over insufficiently tested vaccines, Mistrust of authorities and Conspiracy theories. CONCLUSIONS This paper presents a combination of NLP methods applied to find the reasons for vaccine hesitancy in Serbia. Given these reasons, it is now possible to better understand the concerns of people regarding the vaccination process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.