This version is available at https://strathprints.strath.ac.uk/64004/ Strathprints is designed to allow users to access the research output of the University of Strathclyde. Unless otherwise explicitly stated on the manuscript, Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Please check the manuscript for details of any other licences that may have been applied. You may not engage in further distribution of the material for any profitmaking activities or any commercial gain. You may freely distribute both the url (https://strathprints.strath.ac.uk/) and the content of this paper for research or private study, educational, or not-for-profit purposes without prior permission or charge.Any correspondence concerning this service should be sent to the Strathprints administrator: strathprints@strath.ac.ukThe Strathprints institutional repository (https://strathprints.strath.ac.uk) is a digital archive of University of Strathclyde research outputs. It has been developed to disseminate open access research outputs, expose data about those outputs, and enable the management and persistent access to Strathclyde's intellectual output. Abstract. The value of microblogging services (such as Twitter) and social networks (such as Facebook) in disseminating and discussing important events is currently under serious threat from automated or human contributors employed to distort information. While detecting coordinated attacks by their behaviour (e.g. different accounts posting the same images or links, fake profiles, etc.) has been already explored, here we look at detecting coordination in the content (words, phrases, sentences). We are proposing a metric capable of capturing the differences between organic and coordinated posts, which is based on the estimated probability of coincidentally repeating a word sequence. Our simulation results support our conjecture that only when the metric takes the context and the properties of the repeated sequence into consideration, it is capable of separating organic and coordinated content. We also demonstrate how those context-specific adjustments can be obtained using existing resources.
Towards Measuring Content Coordination in Microblogs