While identifying specific user roles in social media -in particular bots or
spammers- has seen significant progress, generic and all-encompassing user
role classification remains elusive on the large data sets of today?s social
media. Yet, such broad classifications enable a deeper understanding of user
interactions and pave the way for longitudinal studies, capturing the
evolution of users such as the rise of influencers. Studies of generic roles
have been performed predominantly in a small scale, establishing fundamental
role definitions, but relying mostly on ad-hoc, data set-dependent rules that
need to be carefully hand-tuned. We build on those studies and provide a
largely automated, scalable detection of a wide range of roles. Our approach
clusters users hierarchically on salient, complementary features such as
their actions, their ability to trigger reactions and their network
positions. To associate these clusters with roles, we use supervised
classifiers: trained on human experts on completely new media, but
transferable on related data sets. Furthermore, we employ the combination of
samples in order to improve scalability and allow probabilistic assignments
of user roles. Our evaluation on Twitter indicates that a) stable and
reliable detection of a wide range of roles is possible b) the labeling
transfers well as long as the fundamental properties don?t strongly change
between data sets and c) the approaches scale well with little need for human
intervention.