A perennial criticism regarding the use of social media in social science research is the lack of demographic information associated with naturally occurring mediated data such as that produced by Twitter. However the fact that demographics information is not explicit does not mean that it is not implicitly present. Utilising the Cardiff Online Social Media ObServatory (COSMOS) this paper suggests various techniques for establishing or estimating demographic data from a sample of more than 113 million Twitter users collected during July 2012. We discuss in detail the methods that can be used for identifying gender and language and illustrate that the proportion of males and females using Twitter in the UK reflects the gender balance observed in the 2011 Census. We also expand on the three types of geographical information that can be derived from Tweets either directly or by proxy and how spatial information can be used to link social media with official curated data. Whilst we make no grand claims about the representative nature of Twitter users in relation to the wider UK population, the derivation of demographic data demonstrates the potential of new social media (NSM) for the social sciences. We consider this paper a clarion call and hope that other researchers test the methods we suggest and develop them further.
Little is currently known about the factors that promote the propagation of information in online social networks following terrorist events. In this paper we took the case of the terrorist event in Woolwich, London in 2013 and built models to predict information flow size and survival using data derived from the popular social networking site Twitter. We define information flows as the propagation over time of information posted to Twitter via the action of retweeting. Following a comparison with different predictive methods, and due to the distribution exhibited by our dependent size measure, we used the zerotruncated negative binomial (ZTNB) regression method. To model survival, the Cox regression technique was used because it estimates proportional hazard rates for independent measures. Following a principal component analysis to reduce the dimensionality of the data, social, temporal and content factors of the tweet were used as predictors in both models. Given the likely emotive reaction caused by the event, we emphasize the influence of emotive content on propagation in the discussion section. From a sample of Twitter data collected following the event (N = 427,330) we report novel findings that identify that the sentiment expressed in the tweet is statistically significantly predictive of both size and survival of information flows of this nature. Furthermore, the number of offline press reports relating to the event published on the day the tweet was posted was a significant predictor of size, as was the tension expressed in a tweet in relation to survival. Furthermore, time lags between retweets and the cooccurrence of URLS and hashtags also emerged as significant.
In this paper, we reflect on the disciplinary contours of contemporary sociology, and social science more generally, in the age of 'big and broad' social data. Our aim is to suggest how sociology and social sciences may respond to the challenges and opportunities presented by this 'data deluge' in ways that are innovative yet sensitive to the social and ethical life of data and methods. We begin by reviewing relevant contemporary methodological debates and consider how they relate to the emergence of big and broad social data as a product, reflexive artefact and organizational feature of emerging global digital society. We then explore the challenges and opportunities afforded to social science through the widespread adoption of a new generation of distributed, digital technologies and the gathering momentum of the open data movement, grounding our observations in the work of the Collaborative Online Social Media ObServatory (COSMOS) project. In conclusion, we argue that these challenges and opportunities motivate a renewed interest in the programme for a 'public sociology', characterized by the co-production of social scientific knowledge involving a broad range of actors and publics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.