Proceedings of the Third Workshop on Abusive Language Online 2019
DOI: 10.18653/v1/w19-3514
|View full text |Cite
|
Sign up to set email alerts
|

Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context

Abstract: We address the task of automatically detecting toxic content in user generated texts. We focus on exploring the potential for preemptive moderation, i.e., predicting whether a particular conversation thread will, in the future, incite a toxic comment. Moreover, we perform preliminary investigation of whether a model that jointly considers all comments in a conversation thread outperforms a model that considers only individual comments. Using an existing dataset of conversations among Wikipedia contributors as … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
24
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 28 publications
(25 citation statements)
references
References 22 publications
1
24
0
Order By: Relevance
“…It is, however, evident from our empirical results that incorporating context through providing the preceding comment to the model did not improve performance for both traditional machine learning and deep learning models. This finding is consistent with other studies that attempt to incorporate interactional context into their models (Karan and Šnajder, 2019;. We believe that effectively incorporating deeper context, as opposed to just the preceding comment, using more sophisticated methods such as hierarchical neural networks might help improve performance.…”
Section: Contextsupporting
confidence: 90%
“…It is, however, evident from our empirical results that incorporating context through providing the preceding comment to the model did not improve performance for both traditional machine learning and deep learning models. This finding is consistent with other studies that attempt to incorporate interactional context into their models (Karan and Šnajder, 2019;. We believe that effectively incorporating deeper context, as opposed to just the preceding comment, using more sophisticated methods such as hierarchical neural networks might help improve performance.…”
Section: Contextsupporting
confidence: 90%
“…The authors showed that certain features in the first messages of a conversation, such as the use of first or second person pronouns and the presence of certain politeness strategies, can help predict if that conversation will remain healthy or if it will degrade and lead to harmful messages later on. Their work inspired the authors of [13], who trained and tested an SVM using TFIDF-weighted unigrams and bigrams as well as a BiLSTM using their own word embeddings. They were ultimately dissatisfied with their results, however the fact they focused only on words and did not use more sophisticated features such as those in [1] may be the cause.…”
Section: Related Workmentioning
confidence: 99%
“…Recent work of volunteer moderators and moderation mainly focuses on user-governed platforms such as Wikipedia [11,33], and Reddit [9,16,29]. Twitch, as a user-moderated, live-streaming community, is similar in some governance aspects to other online communities such as Reddit, which is a self-reliant community [29], and Facebook Group, which provides multiparty interactions [51].…”
Section: Introductionmentioning
confidence: 99%