Detecting strong ties among users in social and information networks is a fundamental operation that can improve performance on a multitude of personalization and ranking tasks. There are a variety of ways a tie can be deemed "strong", and in this work we use a data-driven (or supervised) approach by assuming that we are provided a sample set of edges labeled as strong ties in the network. Such labeled edges are often readily obtained from the social network as users often participate in multiple overlapping networks via features such as following and messaging. These networks may vary greatly in size, density and the information they carry -for instance, a heavily-used dense network (such as the network of followers) commonly overlaps with a secondary sparser network composed of strong ties (such as a network of email or phone contacts). This setting leads to a natural strong tie detection task: given a small set of labeled strong tie edges, how well can one detect unlabeled strong ties in the remainder of the network?This task becomes particularly daunting for the Twitter network due to scant availability of pairwise relationship attribute data, and sparsity of strong tie networks such as phone contacts. Given these challenges, a natural approach is to instead use structural network features for the task, produced by combining the strong and "weak" edges. In this work, we demonstrate via experiments on Twitter data that using only such structural network features is sufficient for detecting strong ties with high precision. These structural network features are obtained from the presence and frequency of small network motifs on combined strong and weak ties. We observe that using motifs larger than triads alleviate sparsity problems that arise for smaller motifs, both due to increased combinatorial possibilities as well as benefiting strongly from searching beyond the ego network. Empirically, we observe that not all motifs are equally useful, and need to be carefully constructed from the combined edges in order to be effective for strong tie detection. Finally, we reinforce our experimental findings with providing theoretical justification that suggests why incorporating these larger sized motifs as features could lead to increased performance in planted graph models.
In many domains, a latent competition among different conventions determines which one will come to dominate. One sees such effects in the success of community jargon, of competing frames in political rhetoric, or of terminology in technical contexts. These effects have become widespread in the on-line domain, where the ease of information transmission makes them particularly forceful, and where the available data offers the potential to study competition among conventions at a fine-grained level.In analyzing the dynamics of conventions over time, however, even with detailed on-line data, one encounters two significant challenges. First, as conventions evolve, the underlying substance of their meaning tends to change as well; and such substantive changes confound investigations of social effects. Second, the selection of a convention takes place through the complex interactions of individuals within a community, and contention between the users of competing conventions plays a key role in the convention's evolution. Any analysis of the overall dynamics must take place in the presence of these two issues.In this work we study a setting in which we can cleanly track the competition among conventions while explicitly taking these sources of complexity into account. Our analysis is based on the spread of low-level authoring conventions in the e-print arXiv over 24 years and roughly a million posted papers: by tracking the spread of macros and other author-defined conventions, we are able to study conventions that vary even as the underlying meaning remains constant. We find that the interaction among co-authors over time plays a crucial role in the selection of conventions; the distinction between more and less experienced members of the community, and the distinction between conventions with visible versus invisible effects, are both central to the underlying processes. Through our analysis we make predictions at the population level about the ultimate success of different synonymous conventions over time -and at the individual level about the outcome of "fights" between people over convention choices.
In the summer of 2013, Brazil experienced a period of conflict triggered by a series of protests. While the popular press covered the events, little empirical work has investigated how first-hand reporting of the protests occurred and evolved over social media and how such exposure in turn impacted the demonstrations themselves. In this study we examine over 42 million tweets shared during the three months of conflict in order to uncover patterns in online and offline protest-related activity as well as to explore relationships between language-use in tweets and the emotions and underlying motivations of protesters. Our findings show that peaks in Twitter activity coincide with days in which heavy protesting took place, that the words in tweets reflect emotional characteristics of protest-related events, and less expectedly, that these emotions convey both positive as well as negative sentiment.
Cascades on social and information networks have been a tremendously popular subject of study in the past decade, and there is a considerable literature on phenomena such as diffusion mechanisms, virality, cascade prediction, and peer network effects. Against the backdrop of this research, a basic question has received comparatively little attention: how desirable are cascades on a social media platform from the point of view of users? While versions of this question have been considered from the perspective of the producers of cascades, any answer to this question must also take into account the effect of cascades on their audience -the viewers of the cascade who do not directly participate in generating the content that launched it. In this work, we seek to fill this gap by providing a consumer perspective of information cascades.Users on social and information networks play the dual role of producers and consumers, and our work focuses on how users perceive cascades as consumers. Starting from this perspective, we perform an empirical study of the interaction of Twitter users with retweet cascades. We measure how often users observe retweets in their home timeline, and observe a phenomenon that we term the Impressions Paradox: the share of impressions for cascades of size k decays much more slowly than frequency of cascades of size k. Thus, the audience for cascades can be quite large even for rare large cascades. We also measure audience engagement with retweet cascades in comparison to non-retweeted or organic content. Our results show that cascades often rival or exceed organic content in engagement received per impression. This result is perhaps surprising in that consumers didn't opt in to see tweets from these authors. Furthermore, although cascading content is widely popular, one would expect it to eventually reach parts of the audience that may not be interested in the content. Motivated by the tension in these empirical findings, we posit a simple theoretical model that focuses on the effect of cascades on the audience (rather than the cascade producers). Our results on this model highlight the balance between retweeting as a high-quality content selection mechanism and the role of network users in filtering irrelevant content. In particular, the results suggest that together these two effects enable the audience to consume a high quality stream of content in the presence of cascades.
An active line of research has studied the detection and representation of trends in social media content. There is still relatively little understanding, however, of methods to characterize the early adopters of these trends: who picks up on these trends at different points in time, and what is their role in the system? We develop a framework for analyzing the population of users who participate in trending topics over the course of these topics' lifecycles. Central to our analysis is the notion of a status gradient, describing how users of different activity levels adopt a trend at different points in time. Across multiple datasets, we find that this methodology reveals key differences in the nature of the early adopters in different domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.