Background
The COVID-19 pandemic is severely affecting people worldwide. Currently, an important approach to understand this phenomenon and its impact on the lives of people consists of monitoring social networks and news on the internet.
Objective
The purpose of this study is to present a methodology to capture the main subjects and themes under discussion in news media and social media and to apply this methodology to analyze the impact of the COVID-19 pandemic in Brazil.
Methods
This work proposes a methodology based on topic modeling, namely entity recognition, and sentiment analysis of texts to compare Twitter posts and news, followed by visualization of the evolution and impact of the COVID-19 pandemic. We focused our analysis on Brazil, an important epicenter of the pandemic; therefore, we faced the challenge of addressing Brazilian Portuguese texts.
Results
In this work, we collected and analyzed 18,413 articles from news media and 1,597,934 tweets posted by 1,299,084 users in Brazil. The results show that the proposed methodology improved the topic sentiment analysis over time, enabling better monitoring of internet media. Additionally, with this tool, we extracted some interesting insights about the evolution of the COVID-19 pandemic in Brazil. For instance, we found that Twitter presented similar topic coverage to news media; the main entities were similar, but they differed in theme distribution and entity diversity. Moreover, some aspects represented negative sentiment toward political themes in both media, and a high incidence of mentions of a specific drug denoted high political polarization during the pandemic.
Conclusions
This study identified the main themes under discussion in both news and social media and how their sentiments evolved over time. It is possible to understand the major concerns of the public during the pandemic, and all the obtained information is thus useful for decision-making by authorities.
Identifying subjective sentences and classifying the polarity of subjective sentences are two important tasks in sentiment analysis. Besides being a hot topic, there is still a lack of resources to perform the mentioned sentiment analysis tasks in the Portuguese language, with its syntactic specificities. This paper describes the identified challenges and next steps in an initial study regarding the classification of subjectivity and polarity of sentences with a small set of syntactic features extracted directly from the text. Our approach reached satisfying results in experiments with two classic machine learning models in four datasets consisting of user reviews from different domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.