Abstract-Due to dynamic nature of current software development methods, changes in requirements are embraced and given proper consideration. However, this triggers the rank reversal problem which involves re-prioritizing requirements based on stakeholders' feedback. It incurs significant cost because of time elapsed in large number of human interactions. To solve this issue, a Semi-Automated Framework for soFtware Requirements priOritizatioN (SAFFRON) is presented in this paper. For a particular requirement, SAFFRON predicts appropriate stakeholders' ratings to reduce human interactions. Initially, item-item collaborative filtering is utilized to estimate similarity between new and previously elicited requirements. Using this similarity, stakeholders who are most likely to rate requirements are determined. Afterwards, collaborative filtering based on latent factor model is used to predict ratings of those stakeholders. The proposed approach is implemented and tested on RALIC dataset. The results illustrate consistent correlation, similar to state of the art approaches, with the ground truth. In addition, SAFFRON requires 13.5-27% less human interaction for reprioritizing requirements.
Background Since the advent of the COVID-19 pandemic, individuals of Asian descent (colloquial usage prevalent in North America, where “Asian” is used to refer to people from East Asia, particularly China) have been the subject of stigma and hate speech in both offline and online communities. One of the major venues for encountering such unfair attacks is social networks, such as Twitter. As the research community seeks to understand, analyze, and implement detection techniques, high-quality data sets are becoming immensely important. Objective In this study, we introduce a manually labeled data set of tweets containing anti-Asian stigmatizing content. Methods We sampled over 668 million tweets posted on Twitter from January to July 2020 and used an iterative data construction approach that included 3 different stages of algorithm-driven data selection. Finally, we found volunteers who manually annotated the tweets by hand to arrive at a high-quality data set of tweets and a second, more sampled data set with higher-quality labels from multiple annotators. We presented this final high-quality Twitter data set on stigma toward Chinese people during the COVID-19 pandemic. The data set and instructions for labeling can be viewed in the Github repository. Furthermore, we implemented some state-of-the-art models to detect stigmatizing tweets to set initial benchmarks for our data set. Results Our primary contributions are labeled data sets. Data Set v3.0 contained 11,263 tweets with primary labels (unknown/irrelevant, not-stigmatizing, stigmatizing-low, stigmatizing-medium, stigmatizing-high) and tweet subtopics (eg, wet market and eating habits, COVID-19 cases, bioweapon). Data Set v3.1 contained 4998 (44.4%) tweets randomly sampled from Data Set v3.0, where a second annotator labeled them only on the primary labels and then a third annotator resolved conflicts between the first and second annotators. To demonstrate the usefulness of our data set, preliminary experiments on the data set showed that the Bidirectional Encoder Representations from Transformers (BERT) model achieved the highest accuracy of 79% when detecting stigma on unseen data with traditional models, such as a support vector machine (SVM) performing at 73% accuracy. Conclusions Our data set can be used as a benchmark for further qualitative and quantitative research and analysis around the issue. It first reaffirms the existence and significance of widespread discrimination and stigma toward the Asian population worldwide. Moreover, our data set and subsequent arguments should assist other researchers from various domains, including psychologists, public policy authorities, and sociologists, to analyze the complex economic, political, historical, and cultural underlying roots of anti-Asian stigmatization and hateful behaviors. A manually annotated data set is of paramount importance for developing algorithms that can be used to detect stigma or problematic text, particularly on social media. We believe this contribution will help predict and subsequently design interventions that will significantly help reduce stigma, hate, and discrimination against marginalized populations during future crises like COVID-19.
BACKGROUND Since the advent of the COVID-19 pandemic, individuals of Asian descent have been the subject of stigma and hate speech in both offline and online communities. One of the major venues for encountering such unfair attacks is social networks such as Twitter. As the research community seeks to understand, analyze and implement detection techniques, high-quality datasets are becoming immensely important. OBJECTIVE In this study, we introduce a manually labeled dataset of Tweets having anti-Asian stigmatizing content. METHODS We sampled over 668M Tweets posted on Twitter between January 2020 to July 2020 and used an iterative data construction approach that includes three different stages of algorithm-driven data selection and manual labeling to finally arrive at 11,263 Tweets with primary labels (unknown/irrelevant, not-stigmatizing, stigmatizing-low, stigmatizing-medium, stigmatizing-high) and Tweet sub-topics (for e.g., wet market and eating habits, COVID-19 cases, bioweapon, etc.). Moreover, we selected 5,000 Tweets from that dataset and labeled them by a second annotator, and then a third annotator resolved conflicts in labels between first and second annotators. We present this final dataset as a high quality Twitter dataset on stigma towards Chinese people during COVID-19 pandemic. The dataset and instructions for labeling can be viewed in the Github repository: https://anonymous.4open.science/r/COVID-Stigma-A-Dataset-of-Anti-Asian-Stigmatizing-Tweets-During-COVID-19-65DD. RESULTS We implement some state-of-the-art models to detect stigmatizing Tweets to set initial benchmarks for our dataset. Our results show the Bidirectional Encoder Representations from Transformers (BERT) model achieves the highest accuracy of 79% when detecting stigma on unseen data with traditional models such as Support Vector Machine performing at 73% accuracy. CONCLUSIONS Our dataset can be used as a benchmark for further qualitative and quantitative research and analysis around the issue. We believe this contribution will help to significantly predict and hence reduce the unfair stigma, hate, and discrimination against Asian people during future crises like COVID-19.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.