Norm Violation in Online Communities – A Study of Stack Overflow Comments

Cheriyan, Jithin; Savarimuthu, Bastin Tony Roy; Cranefield, Stephen

doi:10.1007/978-3-030-72376-7_2

Cited by 8 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The sizes of the datasets after sampling are shown in Table 1. To identify the offensive comments, we obtained the Perspective API (PAPI) 7 score and Regular Expression (Regex) status of the comments since these measures have been used by the SO Heat Detection bot 8 to detect toxicity in SO comments. PAPI offers a 'toxicity' range for a text in the interval of 0 and 1, where 0 represents least toxic and 1 represents extreme toxicity.…”

Section: Methodsmentioning

confidence: 99%

“…The nature of gender hostility in SO has been analysed by Brooke [3]. Cheriyan et al [7] report norm violations in SO and provide manual analysis of comments to show that offence and unfriendliness also exist in SO [7]. Stress owing to toxicity in open source communities has been investigated by Raman et al [30].…”

Section: Related Work 21 Offensive Language Detectionmentioning

confidence: 99%

“…Then we manually evaluated the comments of these platforms that had both a Regex match presence and a PAPI score >=0.7. Subsequently, we used multi-label classification of offensive comments based on the taxonomy generated by Cheriyan et al [7]. For this work, the taxonomy has three classes: Personal, Racial and Swearing.…”

Section: Manual Classificationmentioning

confidence: 99%

See 2 more Smart Citations

Towards offensive language detection and reduction in four Software Engineering communities

Cheriyan

Savarimuthu

Cranefield

2021

Evaluation and Assessment in Software Engineering

Self Cite

View full text Add to dashboard Cite

Software Engineering (SE) communities such as Stack Overflow have become unwelcoming, particularly through members' use of offensive language. Research has shown that offensive language drives users away from active engagement within these platforms. This work aims to explore this issue more broadly by investigating the nature of offensive language in comments posted by users in four prominent SE platforms -GitHub, Gitter, Slack and Stack Overflow (SO). It proposes an approach to detect and classify offensive language in SE communities by adopting natural language processing and deep learning techniques. Further, a Conflict Reduction System (CRS), which identifies offence and then suggests what changes could be made to minimize offence has been proposed. Beyond showing the prevalence of offensive language in over 1 million comments from four different communities which ranges from 0.07% to 0.43%, our results show promise in successful detection and classification of such language. The CRS system has the potential to drastically reduce manual moderation efforts to detect and reduce offence in SE communities.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Work 21 Offensive Language Detectionmentioning

confidence: 99%

See 1 more Smart Citation

Towards offensive language detection and reduction in four Software Engineering communities

Cheriyan

Savarimuthu

Cranefield

2021

Evaluation and Assessment in Software Engineering

Self Cite

View full text Add to dashboard Cite

show abstract

“…Such differences make chats in the synchronous domain more difficult to be moderated by existing approaches. Founta et al, 2018;Basile et al, 2019;ElSherief et al, 2021), Reddit (Datta and Adar, 2019;Kumar et al, 2018;Park et al, 2021), Stackoverflow (Cheriyan et al, 2017) and Github (Miller et al, 2022), efforts that extend them to live streaming platforms have been absent. In this paper, we study unique characteristics of comments in livestreaming services and develop new datasets and models for appropriately using contextual information to automatically moderate toxic content and norm violations.…”

Section: Introductionmentioning

confidence: 99%

Analyzing Norm Violations in Live-Stream Chat

Moon,

Lee,

Cho

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Toxic language, such as hate speech, can deter users from participating in online communities and enjoying popular platforms. Previous approaches to detecting toxic language and norm violations have been primarily concerned with conversations from online forums and social media, such as Reddit and Twitter. These approaches are less effective when applied to conversations on live-streaming platforms, such as Twitch and YouTube Live, as each comment is only visible for a limited time and lacks a thread structure that establishes its relationship with other comments. In this work, we share the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms. We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch. We articulate several facets of live-stream data that differ from other forums, and demonstrate that existing models perform poorly in this setting. By conducting a user study, we identify the informational context humans use in live-stream moderation, and train models leveraging context to identify norm violations. Our results show that appropriate contextual information can boost moderation performance by 35%. 1

show abstract

Analyzing the Correlation Between Toxic Comments and Code Quality

Sayago‐Heredia,

Sailema,

Pérez‐Castillo

et al. 2024

J Software Evolu Process

View full text Add to dashboard Cite

Software development has a relevant human side, and this could, for example, imply that developers' feelings have an impact on certain aspects of software development such as quality, productivity, or performance. This paper explores the effects of toxic emotions on code quality and presents the SentiQ tool, which gathers and analyzes sentiments from commit messages (obtained from GitHub) and code quality measures (obtained from SonarQube). The SentiQ tool we proposed performs a sentiment analysis (based on natural language processing techniques) and relates the results to the code quality measures. The datasets extracted are then used as the basis on which to conduct a preliminary case study, which demonstrates that there is a relationship between toxic comments and code quality that may affect the quality of the whole software project. This has resulted in the drafting of a predictive model to validate the correlation of the impact of toxic comments on code quality. The main implication of this work is that these results could, in the future, make it possible to estimate code quality as a function of developers' toxic comments.

show abstract

Norm Violation in Online Communities – A Study of Stack Overflow Comments

Cited by 8 publications

References 13 publications

Towards offensive language detection and reduction in four Software Engineering communities

Towards offensive language detection and reduction in four Software Engineering communities

Analyzing Norm Violations in Live-Stream Chat

Analyzing the Correlation Between Toxic Comments and Code Quality

Contact Info

Product

Resources

About