The advent of social media has facilitated the study of information diffusion, user interaction and user influence over social networks. The research on analyzing information spreading focuses mostly on modeling, while analyses of real-life data have been limited to small, carefully cleaned datasets that are analyzed in an offline fashion. In this paper, we present an approach for online analysis of information diffusion in Twitter. We reconstruct so-called information cascades that model how information is being propagated from user to user from the stream of messages and the social graph. The results show that such an inference is feasible even on noisy, largescale, rapidly produced data. We provide insights into the impact of incomplete data and the effect of different influence models on the cascades. The observed cascades show a significant amount of variety in scale and structure.
In recent years, research in information diffusion in social media has attracted a lot of attention, since the produced data is fast, massive and viral. Additionally, the provenance of such data is equally important because it helps to judge the relevance and trustworthiness of the information enclosed in the data. However, social media currently provide insufficient m echanisms f or provenance, while models of information diffusion use their own concepts and notations, targeted to specific use cases. In this paper, we propose a model for information diffusion and provenance, based on the W3C PROV Data Model. The advantage is that PROV is a Web-native and interoperable format that allows easy publication of provenance data, and minimizes the integration effort among different systems making use of PROV.
In order to assess the trustworthiness of information on social media, a consumer needs to understand where this information comes from, and which processes were involved in its creation. The entities, agents and activities involved in the creation of a piece of information are referred to as its provenance, which was standardized by W3C PROV. However, current social media APIs cannot always capture the full lineage of every message, leaving the consumer with incomplete or missing provenance, which is crucial for judging the trust it carries. Therefore in this paper, we propose an approach to reconstruct the provenance of messages on social media on multiple levels. To obtain a fine-grained level of provenance, we use an approach from prior work to reconstruct information cascades with high certainty, and map them to PROV using the PROV-SAID extension for social media. To obtain a coarse-grained level of provenance, we adapt our similarity-based, fuzzy provenance reconstruction approach-previously applied on news. We illustrate the power of the combination by providing the reconstructed provenance of a limited social media dataset gathered during the 2012 Olympics, for which we were able to reconstruct a significant amount of previously unidentified connections.
The goal of this thesis is to investigate real-time analysis methods on social media with a focus on information diffusion. From a conceptual point of view, we are interested both in the structural, sociological and temporal aspects of information diffusion in social media with a twist on the real time factor of what is happening right now. From a technical side, the sheer size of current social media services (100's of millions of users) and the large amount of data produced by these users renders conventional approaches for these costly analyses impossible. For that, we need to go beyond the state-of-the-art infrastructure for data-intensive computation. Our high level goal is to investigate how information diffuses in real time on the underlying social network and the role of different users in the propagation process. We plan to implement these analyses with full and partially missing datasets and compare the cost and quality of both approaches.
This paper sheds light on the different interaction types among social media users that benefit information diffusion and provenance analysis. In particular, we identify explicit and implicit interactions in Twitter, including informal conventions applied by users. In our empirical evaluation considering only retweets, the most common means of information propagation in Twitter, we can infer 50% of message provenance. However, if we consider other types of interactions, we can explain another 13%. Accordingly, we enrich the PROV-SAID model for information diffusion, which extends the W3C PROV standard for provenance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.