Misleading information is nothing new, yet its impacts seem only to grow. We investigate this phenomenon in the context of social bots. Social bots are software agents that mimic humans. They are intended to interact with humans while supporting specific agendas. This work explores the effect of social bots on the spread of misinformation on Facebook during the Fall of 2016 and prototypes a tool for their detection. Using a dataset of about two million user comments discussing the posts of public pages for nine verified news outlets, we first annotate a large dataset for social bots. We then develop and evaluate commercially implementable bot detection software for public pages with an overall F1 score of 0.71. Applying this software, we found only a small percentage (0.06%) of the commenting user population to be social bots. However, their activity was extremely disproportionate, producing comments at a rate more than fifty times higher (3.5%). Finally, we observe that one might commonly encounter social bot comments at a rate of about one in ten on mainstream outlet and reliable content news posts. In light of these findings and to support page owners and their communities we release prototype code and software to help moderate social bots on Facebook.
Veracity assessment of news and social bot detection have become two of the most pressing issues for social media platforms, yet current gold-standard data are limited. This paper presents a leap forward in the development of a sizeable and feature rich gold-standard dataset. The dataset was built by using a collection of news items posted to Facebook by nine news outlets during September 2016, which were annotated for veracity by BuzzFeed. These articles were refined beyond binary annotation to the four categories: mostly true, mostly false, mixture of true and false, and no factual content. Our contribution integrates data on Facebook comments and reactions publicly available on the platform’s Graph API, and provides tailored tools for accessing news article web content. The features of the accessed articles include body text, images, links, Facebook plugin comments, Disqus plugin comments, and embedded tweets. Embedded tweets provide a potent possible avenue for expansion across social media platforms. Upon development, this utility yielded over 1.6 million text items, making it over 400 times larger than the current gold-standard. The resulting dataset—BuzzFace—is presently the most extensive created, and allows for more robust machine learning applications to news veracity assessment and social bot detection than ever before.
We present a novel named entity recognition (NER) system, and its participation in the Emerging and Rare Entity Recognition shared task, hosted at the 2017 EMNLP Workshop on Noisy User Generated Text (W-NUT). With a specialized evaluation highlighting performance on rare, and sparsely-occurring named entities, this task provided an excellent opportunity to build out a newly-developed statistical algorithm and benchmark it against the state-of-the-art. Powered by flexible context features of word forms, our system's capacity for identifying neverbefore-seen entities made it well suited for the task. Since the system was only developed to recognize a limited number of named entity types, its performance was lower overall. However, performance was competitive on the categories trained, indicating potential for future development.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.