As the COVID‐19 pandemic has unfolded, Hate Speech on social media about China and Chinese people has encouraged social stigmatization. For the historical and humanistic purposes, this history‐in‐the‐making needs to be archived and analyzed. Using the query “china+and+coronavirus” to scrape from the Twitter API, we have obtained 3,457,402 key tweets about China relating to COVID‐19. In this archive, in which about 40% of the tweets are from the U.S., we identify 25,467 Hate Speech occurrences and analyze them according to lexicon‐based emotions and demographics using machine learning and network methods. The results indicate that there are substantial associations between the amount of Hate Speech and demonstrations of sentiments, and state demographics factors. Sentiments of surprise and fear associated with poverty and unemployment rates are prominent. This digital archive and the related analyses are not simply historical, therefore. They play vital roles in raising public awareness and mitigating future crises. Consequently, we regard our research as a pilot study in methods of analysis that might be used by other researchers in various fields.
This paper reports on a study to identify incidence and shifting dynamics of hate speech from English-language tweets relating to the COVID-19 pandemic that were made between February and June 2020. Tweets were repeatedly scraped, ingested and aggregated within the COVID-19 Hate Speech Twitter Archive (CHSTA) and analyzed for hate speech using the Generative Adversarial Network (GAN)-inspired DCAP Method. Outcomes suggest that it is possible to use machine learning and data analytics to surface and substantiate trends from CHSTA and similar social media archives that could provide immediately useful knowledge for crisis response, in controversial situations, or for public policy development, as well as for subsequent historical analysis. The approach shows potential for integrating multiple aspects of the archival workflow, and supporting automatic iterative redescription and reappraisal activities in ways that make them more accountable and more rapidly responsive to changing societal interests and unfolding developments.
Addressing increasing calls to surface hidden and counter-narratives from within archival collections, this paper reports on a study that provides proof-of-concept of automatic methods that could be used on archived social media collections. Using a test collection of 3,457,434 unique tweets relating to COVID-19, China and Chinese people, it sought to identify instances of Hate Speech as well as hard-to-pinpoint trends in anti-Chinese racist sentiment. The study, part of a larger archival research effort investigating automatic methods for appraisal and description of very large digital archival collections, used a Three-step Social Media Similarity (TSMS) mapping method that aggregates hashtag mapping, TF-IDF Similarity Selection, and Emotion Similarity Calculation on the test collection. Compared to using a purely lexicon-based method to identify and analyze controversial speech, this method successfully expanded the amount of controversial contents detected from 21,050 tweets to 212,605, and the detection rate from 0.6% to 6.1%. We argue that the TSMS method could be similarly applied by archives in automatically identifying, analyzing, describing other controversial content on social media and in other rapidly evolving and complex contexts in order to increase public awareness and facilitate public policy responses.
This paper reports on a study using machine learning to identify incidences and shifting dynamics of hate speech in social media archives. To better cope with the archival processing need for such large scale and fast evolving archives, we propose the Data-driven and Circulating Archival Processing (DCAP) method. As a proof-of-concept, our study focuses on an English language Twitter archive relating to COVID-19: tweets were repeatedly scraped between February and June 2020, ingested and aggregated within the COVID-19 Hate Speech Twitter Archive (CHSTA) and analyzed for hate speech using the Generative Adversarial Network (GAN)-inspired DCAP Method. Outcomes suggest that it is possible to use machine learning and data analytics to surface and substantiate trends from CHSTA and similar social media archives that could provide immediately useful knowledge for crisis response, in controversial situations, or for public policy development, as well as for subsequent historical analysis. The approach shows potential for integrating multiple aspects of the archival workflow, and supporting automatic iterative redescription and reappraisal activities in ways that make them more accountable and more rapidly responsive to changing societal interests and unfolding developments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.