Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1305
|View full text |Cite
|
Sign up to set email alerts
|

WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Abstract: We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversationsincluding not only comments and replies, but also their modifications, deletions and restorations-this data offers an unprecedented view of online conversation. This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration. We il… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 29 publications
(30 citation statements)
references
References 12 publications
0
30
0
Order By: Relevance
“…We found that on these examples the silver labels have 0.51 precision and 1.00 recall. This yields 0.67 F1 measure and is somewhat lower than the expected 0.85 obtained for this classifier in (Hua et al, 2018). The difference indicates that the thresholds from (Hua et al, 2018) obtained on non-deleted comments from Wikipedia may not perform equally well on deleted comments.…”
Section: Datasetmentioning
confidence: 61%
See 1 more Smart Citation
“…We found that on these examples the silver labels have 0.51 precision and 1.00 recall. This yields 0.67 F1 measure and is somewhat lower than the expected 0.85 obtained for this classifier in (Hua et al, 2018). The difference indicates that the thresholds from (Hua et al, 2018) obtained on non-deleted comments from Wikipedia may not perform equally well on deleted comments.…”
Section: Datasetmentioning
confidence: 61%
“…This yields 0.67 F1 measure and is somewhat lower than the expected 0.85 obtained for this classifier in (Hua et al, 2018). The difference indicates that the thresholds from (Hua et al, 2018) obtained on non-deleted comments from Wikipedia may not perform equally well on deleted comments. To address this and increase the quality of the labels, more deleted comments should be manually labeled and thresholds retuned using, e.g., the same error rate method of (Wulczyn et al, 2017a).…”
Section: Datasetmentioning
confidence: 61%
“…Zhang et al's 'Conversations Gone Awry' dataset consists of 1,270 conversations that took place between Wikipedia editors on publicly accessible talk pages. The conversations are sourced from the WikiConv dataset (Hua et al, 2018) and labeled by crowdworkers as either containing a personal attack from within (i.e., hostile behavior by one user in the conversation directed towards another) or remaining civil throughout.…”
Section: Derailment Datasetsmentioning
confidence: 99%
“…In this work we use the complete conversational history between English Wikipedia editors on both article and user talk pages. With over 90 million conversations between 4 million users on 24 million talk pages, this is one of the largest collections of public conversations [20].…”
Section: Blocks On Wikipediamentioning
confidence: 99%