Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

Clark, Eric M.; Williams, Jake Ryland; Jones, Chris; Galbraith, Richard A.; Danforth, Christopher M.; Dodds, Peter Sheridan

doi:10.1016/j.jocs.2015.11.002

Cited by 81 publications

(63 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The present study did not distinguish between human accounts and bots because the mechanisms for doing so are still in development 19. Findings described herein could reflect, in part, sentiments of automated accounts.…”

Section: Discussionmentioning

confidence: 72%

Campaigns and counter campaigns: reactions on Twitter to e-cigarette education

et al. 2016

View full text Add to dashboard Cite

Background Social media present opportunities for public health departments to galvanise interest in health issues. A challenge is creating content that will resonate with target audiences, and determining reactions to educational material. Twitter can be used as a real-time surveillance system to capture individuals’ immediate reactions to education campaigns and such information could lead to better campaigns in the future. A case study testing Twitter’s potential presented itself when the California Department of Public Health launched its ‘Still Blowing Smoke’ media campaign about the potential harmful effects of e-cigarettes. Pro-e-cigarette advocacy groups, in response, launched a counter campaign titled ‘Not Blowing Smoke’. This study tracked the popularity of the two campaigns on Twitter, analysed the content of the messages and determined who was involved in these discussions. Methods The study period was from 22 March 2015 to 27 June 2015. A stratified sampling procedure supplied 2192 tweets for analysis. Content analysis identified pro, anti and neutral e-cigarette tweets, and five additional themes: Marketing Elements, Money, Regulation/propaganda, Health, and Other. Metadata were analysed to obtain additional information about Twitter accounts. Results ‘Not Blowing Smoke’ was referenced more frequently than ‘Still Blowing Smoke’ on Twitter. Messages commonly objected to government regulation of e-cigarettes, refuted claims that e-cigarette manufactures were aligned with big tobacco, and touted the health benefits of e-cigarette use. E-cigarette companies and vape shops used campaign slogans to communicate with customers on Twitter. Conclusions Findings showed the time dynamics of Twitter and the possibility for real-time monitoring of education campaigns.

show abstract

Section: Discussionmentioning

confidence: 72%

Campaigns and counter campaigns: reactions on Twitter to e-cigarette education

et al. 2016

View full text Add to dashboard Cite

show abstract

“…To allow for bot detection at the user level, all these methods still require the analysis of some historical user data, either by indirect data collection [10,11,13,57], or, like in the case of BotOrNot [15], by interrogating the Twitter API (which imposes strict rate limits, making it impossible to do large-scale bot detection). To the best of our knowledge, no tweet-based detection system existed prior to this work.…”

Section: Related Workmentioning

confidence: 99%

Deep neural networks for bot detection

Kudugunta

Ferrara²

2018

Information Sciences

375

193

View full text Add to dashboard Cite

The problem of detecting bots, automated social media accounts governed by software but disguising as human users, has strong implications. For example, bots have been used to sway political elections by distorting online discourse, to manipulate the stock market, or to push anti-vaccine conspiracy theories that caused health epidemics. Most techniques proposed to date detect bots at the account level, by processing large amount of social media posts, and leveraging information from network structure, temporal dynamics, sentiment analysis, etc. In this paper, we propose a deep neural network based on contextual long short-term memory (LSTM) architecture that exploits both content and metadata to detect bots at the tweet level: contextual features are extracted from user metadata and fed as auxiliary input to LSTM deep nets processing the tweet text. Another contribution that we make is proposing a technique based on synthetic minority oversampling to generate a large labeled dataset, suitable for deep nets training, from a minimal amount of labeled data (roughly 3,000 examples of sophisticated Twitter bots). We demonstrate that, from just one single tweet, our architecture can achieve high classification accuracy (AUC > 96%) in separating bots from humans. We apply the same architecture to account-level bot detection, achieving nearly perfect classification accuracy (AUC > 99%). Our system outperforms previous state of the art while leveraging a small and interpretable set of features yet requiring minimal training data.

show abstract

“…The research presented here is one such example. Other examples include the classification system proposed by Chu et al [7,8], the crowd-sourcing detection framework by Wang et al [34], the NLP-based detection methods by Clark et al [9], and the BotOrNot classifier [11].…”

Section: Related Workmentioning

confidence: 99%

Measuring Bot and Human Behavioral Dynamics

Pozzana

Ferrara

2020

Front. Phys.

View full text Add to dashboard Cite

Bots, social media accounts controlled by software rather than by humans, have recently been under the spotlight for their association with various forms of online manipulation. To date, much work has focused on social bot detection, but little attention has been devoted to the characterization and measurement of the behavior and activity of bots, as opposed to humans'. Over the course of the years, bots have become more sophisticated, and capable to reflect some short-term behavior, emulating that of human users. The goal of this paper is to study the behavioral dynamics that bots exhibit over the course of one activity session, and highlight if and how these differ from human activity signatures. By using a large Twitter dataset associated with recent political events, we first separate bots and humans, then isolate their activity sessions. We compile a list of quantities to be measured, like the propensity of users to engage in social interactions or to produce content. Our analysis highlights the presence of short-term behavioral trends in humans, which can be associated with a cognitive origin, that are absent in bots, intuitively due to their automated activity. These findings are finally codified to create and evaluate a machine learning algorithm to detect activity sessions produced by bots and humans, to allow for more nuanced bot detection strategies.

show abstract

Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

Cited by 81 publications

References 21 publications

Campaigns and counter campaigns: reactions on Twitter to e-cigarette education

Campaigns and counter campaigns: reactions on Twitter to e-cigarette education

Deep neural networks for bot detection

Measuring Bot and Human Behavioral Dynamics

Contact Info

Product

Resources

About