Leon Derczynski scite author profile

Media is full of false claims. Even Oxford Dictionaries named "post-truth" as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the nature of the discourse around it. RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics -each having their own families of claims and replies -and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.

show abstract

SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours

Gorrell¹,

Kochkina²,

Liakata³

et al. 2019

175

198

View full text Add to dashboard Cite

This is the proposal for RumourEval-2019, which will run in early 2019 as part of that year's SemEval event.Since the first RumourEval shared task in 2017, interest in automated claim validation has greatly increased, as the dangers of "fake news" have become a mainstream concern. Yet automated support for rumour checking remains in its infancy. For this reason, it is important that a shared task in this area continues to provide a focus for effort, which is likely to increase. We therefore propose a continuation in which the veracity of further rumours is determined, and as previously, supportive of this goal, tweets discussing them are classified according to the stance they take regarding the rumour. Scope is extended compared with the first RumourEval, in that the dataset is substantially expanded to include Reddit as well as Twitter data, and additional languages are also included.

show abstract

Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition

Derczynski¹,

Nichols²,

Erp³

et al. 2017

280

194

View full text Add to dashboard Cite

This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text -even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet "so.. kktny in 30 mins?!" -even human experts find the entity kktny hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text.

show abstract

Analysis of named entity recognition and linking for tweets

Derczynski

Maynard

Rizzo

et al. 2015

Information Processing & Management

291

186

View full text Add to dashboard Cite

Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.

show abstract

Directions in abusive language training data, a systematic review: Garbage in, garbage out

2020

View full text Add to dashboard Cite

Data-driven and machine learning based approaches for detecting, categorising and measuring abusive content such as hate speech and harassment have gained traction due to their scalability, robustness and increasingly high performance. Making effective detection systems for abusive content relies on having the right training datasets, reflecting a widely accepted mantra in computer science: Garbage In, Garbage Out. However, creating training datasets which are large, varied, theoretically-informed and that minimize biases is difficult, laborious and requires deep expertise. This paper systematically reviews 63 publicly available training datasets which have been created to train abusive language classifiers. It also reports on creation of a dedicated website for cataloguing abusive language data hatespeechdata.com. We discuss the challenges and opportunities of open science in this field, and argue that although more dataset sharing would bring many benefits it also poses social and ethical risks which need careful consideration. Finally, we provide evidence-based recommendations for practitioners creating new abusive content training datasets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Leon Derczynski

SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours

SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours

Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition

Analysis of named entity recognition and linking for tweets

Directions in abusive language training data, a systematic review: Garbage in, garbage out

Contact Info

Product

Resources

About