As the COVID‐19 pandemic has unfolded, Hate Speech on social media about China and Chinese people has encouraged social stigmatization. For the historical and humanistic purposes, this history‐in‐the‐making needs to be archived and analyzed. Using the query “china+and+coronavirus” to scrape from the Twitter API, we have obtained 3,457,402 key tweets about China relating to COVID‐19. In this archive, in which about 40% of the tweets are from the U.S., we identify 25,467 Hate Speech occurrences and analyze them according to lexicon‐based emotions and demographics using machine learning and network methods. The results indicate that there are substantial associations between the amount of Hate Speech and demonstrations of sentiments, and state demographics factors. Sentiments of surprise and fear associated with poverty and unemployment rates are prominent. This digital archive and the related analyses are not simply historical, therefore. They play vital roles in raising public awareness and mitigating future crises. Consequently, we regard our research as a pilot study in methods of analysis that might be used by other researchers in various fields.
On March 16, 2021, six Asian women were killed in Atlanta, US, possibly out of racist motivations. This tragic event, now known as the 2021 Atlanta Spa Shootings, precipitated a massive increase in the volume of counter-anti-Asian declarations and discussion on social media platforms such as Twitter. In a pilot study to chronicle and profile public opinions, social movements and patterns in the global Twitter discourse we scraped the Twitter API using the query term "StopAsianHate", obtaining more than 5.5 million tweets and their metadata. By using social movement analytical frameworks to analyze traffic peaks and the use of hashtags, we identified a set of more than 300 frequently used hashtags that can serve as specific query words in future archival ingest activities, as well as the dimensions of and current problems with this social movement. This suggests the utility of this approach for both archiving applications and social-political analyses of emerging topics and concerns.
This paper reports on a study to identify incidence and shifting dynamics of hate speech from English-language tweets relating to the COVID-19 pandemic that were made between February and June 2020. Tweets were repeatedly scraped, ingested and aggregated within the COVID-19 Hate Speech Twitter Archive (CHSTA) and analyzed for hate speech using the Generative Adversarial Network (GAN)-inspired DCAP Method. Outcomes suggest that it is possible to use machine learning and data analytics to surface and substantiate trends from CHSTA and similar social media archives that could provide immediately useful knowledge for crisis response, in controversial situations, or for public policy development, as well as for subsequent historical analysis. The approach shows potential for integrating multiple aspects of the archival workflow, and supporting automatic iterative redescription and reappraisal activities in ways that make them more accountable and more rapidly responsive to changing societal interests and unfolding developments.
Addressing increasing calls to surface hidden and counter-narratives from within archival collections, this paper reports on a study that provides proof-of-concept of automatic methods that could be used on archived social media collections. Using a test collection of 3,457,434 unique tweets relating to COVID-19, China and Chinese people, it sought to identify instances of Hate Speech as well as hard-to-pinpoint trends in anti-Chinese racist sentiment. The study, part of a larger archival research effort investigating automatic methods for appraisal and description of very large digital archival collections, used a Three-step Social Media Similarity (TSMS) mapping method that aggregates hashtag mapping, TF-IDF Similarity Selection, and Emotion Similarity Calculation on the test collection. Compared to using a purely lexicon-based method to identify and analyze controversial speech, this method successfully expanded the amount of controversial contents detected from 21,050 tweets to 212,605, and the detection rate from 0.6% to 6.1%. We argue that the TSMS method could be similarly applied by archives in automatically identifying, analyzing, describing other controversial content on social media and in other rapidly evolving and complex contexts in order to increase public awareness and facilitate public policy responses.
Data archives are an important source of high-quality data in many fields, making them ideal sites to study data reuse. By studying data reuse through citation networks, we are able to learn how hidden research communities – those that use the same scientific datasets – are organized. This paper analyzes the community structure of an authoritative network of datasets cited in academic publications, which have been collected by a large, social science data archive: the Interuniversity Consortium for Political and Social Research (ICPSR). Through network analysis, we identified communities of social science datasets and fields of research connected through shared data use. We argue that communities of exclusive data reuse form “subdivisions” that contain valuable disciplinary resources, while datasets at a “crossroads” broadly connect research communities. Our research reveals the hidden structure of data reuse and demonstrates how interdisciplinary research communities organize around datasets as shared scientific inputs. These findings contribute new ways of describing scientific communities in order to understand the impacts of research data reuse. Peer Review https://publons.com/publon/10.1162/qss_a_00209
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.