Fast, massive, and viral data diffused on social media affects a large share of the online population, and thus, the (prospective) information diffusion mechanisms behind it are of great interest to researchers. The (retrospective) provenance of such data is equally important because it contributes to the understanding of the relevance and trustworthiness of the information. Furthermore, computing provenance in a timely way is crucial for particular use cases and practitioners, such as online journalists that promptly need to assess specific pieces of information. Social media currently provide insufficient mechanisms for provenance tracking, publication and generation, while state-of-the-art on social media research focuses mainly on explicit diffusion mechanisms (like retweets in Twitter or reshares in Facebook).The implicit diffusion mechanisms remain understudied due to the difficulties of being captured and properly understood. From a technical side, the state of the art for provenance reconstruction evaluates small datasets after the fact, sidestepping requirements for scale and speed of current social media data. In this paper, we investigate the mechanisms of implicit information diffusion by computing its fine-grained provenance. We prove that explicit mechanisms are insufficient to capture influence and our analysis unravels a significant part of implicit interactions and influence in social media. Our approach works incrementally and can be scaled up to cover a truly Web-scale scenario like major events. We can process datasets consisting of up to several millions of messages on a single machine at rates that cover bursty behaviour, without compromising result quality. By doing that, we provide to online journalists and social media B Io Taxidou
The archived web provides an important footprint of the past, documenting online social behaviour through social media, and news through media outlets websites and government sites. Consequently, web archiving is increasingly gaining attention of heritage institutions, academics and policy makers. The importance of web archives as data resources for (digital) scholars has been acknowledged for investigating the past. Still, heritage institutions and academics struggle to ‘keep up to pace’ with the fast evolving changes of the World Wide Web and with the changing habits and practices of internet users. While a number of national institutions have set up a national framework to archive ‘regular’ web pages, social media archiving (SMA) is still in its infancy with various countries starting up pilot archiving projects. SMA is not without challenges; the sheer volume of social media content, the lack of technical standards for capturing or storing social media data and social media’s ephemeral character can be impeding factors. The goal of this article is three-fold. First, we aim to extend the most recent descriptive state-of-the-art of national web archiving, published in the first issue of International Journal of Digital Humanities (March 2019) with information on SMA. Secondly, we outline the current legal, technical and operational (such as the selection and preservation policy) aspects of archiving social media content. This is complemented with results from an online survey to which 15 institutions responded. Finally, we discuss and reflect on important challenges in SMA that should be considered in future archiving projects.
No abstract
The quality of knowledge graphs can be assessed by a validation against specified constraints, typically use-case specific and modeled by human users in a manual fashion. Visualizations can improve the modeling process as they are specifically designed for human information processing, possibly leading to more accurate constraints, and in turn higher quality knowledge graphs. However, it is currently unknown how such visualizations support users when viewing RDF constraints as no scientific evidence for the visualizations’ effectiveness is provided. Furthermore, some of the existing tools are likely suboptimal, as they lack support for edit operations or common constraints types. To establish a baseline, we have defined visual notations to represent RDF constraints and implemented them in UnSHACLed, a tool that is independent of a concrete RDF constraint language. In this paper, we (i) present two visual notations that support all SHACL core constraints, built upon the commonly used visualizations VOWL and UML, (ii) analyze both notations based on cognitive effective design principles, (iii) perform a comparative user study between both visual notations, and (iv) present our open source tool UnSHACLed incorporating our efforts. Users were presented RDF constraints in both visual notations and had to answer questions based on visualization task taxonomies. Although no statistical significant difference in mean error rates was observed, all study participants preferred ShapeVOWL in a self assessment to answer RDF constraint-related questions. Furthermore, ShapeVOWL adheres to more cognitive effective design principles according to our performed comparison. Study participants argued that the increased visual features of ShapeVOWL made it easier to spot constraints, but a list of constraints – as in ShapeUML – is easier to read. However, also that more deviations from the strict UML specification and introduction of more visual features can improve ShapeUML. From these findings we conclude that ShapeVOWL has a higher potential to represent RDF constraints more effective compared to ShapeUML. But also that the clear and efficient text encoding of ShapeUML can be improved with visual features. A one-size-fits-all approach to RDF constraint visualization and editing will be insufficient. Therefore, to support different audiences and use cases, user interfaces of RDF constraint editors need to support different visual notations.
Abstract. Data management increasingly demands transparency with respect to data processing. Various stakeholders need information tailored to their needs, e.g. data management plans (DMP) for funding agencies or privacy policies for the public. DMPs and privacy policies are just two examples of documents describing aspects of data processing. Dedicated tools to create both already exist. However, creating each of them manually or semi-automatically remains a repetitive and cognitively challenging task. We propose a data-driven approach that semantically represents the data processing itself as workflows and serves as a base for different kinds of result-sets, generated with SPARQL, i.e. DMPs. Our approach is threefold: (i) users with domain knowledge semantically represent workflow components; (ii) other users can reuse these components to describe their data processing via semantically enhanced workflows; and, based on the semantic workflows, (iii) result-sets are automatically generated on-demand with SPARQL queries. This paper demonstrates our tool that implements the proposed approach, based on a use-case of a researcher who needs to provide a DMP to a funding agency to approve a proposed research project.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.