Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.
Summarising data as text helps people make sense of it. It also improves data discovery, as search algorithms can match this text against keyword queries. In this paper, we explore the characteristics of text summaries of data in order to understand how meaningful summaries look like. We present two complementary studies: a data-search diary study with 69 students, which offers insight into the information needs of people searching for data; and a summarisation study, with a lab and a crowdsourcing component with overall 80 data-literate participants, which produced summaries for 25 datasets. In each study we carried out a qualitative analysis to identify key themes and commonly mentioned dataset attributes, which people consider when searching and making sense of data. The results helped us design a template to create more meaningful textual representations of data, alongside guidelines for improving data-search experience overall.
Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.
Public procurement is a large market affecting almost every organisation and individual; therefore, governments need to ensure its efficiency, transparency, and accountability, while creating healthy, competitive, and vibrant economies. In this context, open data initiatives and integration of data from multiple sources across national borders could transform the procurement market by such as lowering the barriers of entry for smaller suppliers and encouraging healthier competition, in particular by enabling cross-border bids. Increasingly more open data is published in the public sector; however, these are created and maintained in siloes and are not straightforward to reuse or maintain because of technical heterogeneity, lack of quality, insufficient metadata, or missing links to related domains. To this end, we developed an open linked data platform, called TheyBuyForYou, consisting of a set of modular APIs and ontologies to publish, curate, integrate, analyse, and visualise an EU-wide, cross-border, and cross-lingual procurement knowledge graph. We developed advanced tools and services on top of the knowledge graph for anomaly detection, cross-lingual document search, and data storytelling. This article describes the TheyBuyForYou platform and knowledge graph, reports their adoption by different stakeholders and challenges and experiences we went through while creating them, and demonstrates the usefulness of Semantic Web and Linked Data technologies for enhancing public procurement.
Argumentation is a key aspect of communications and can broadly be broken down into problem solving (dialectic) and quarrelling (eristic). Techniques used within argumentation can likewise be classified as fact-based (logical), or emotion/audience-based (rhetorical). Modelling arguments on the social web is a challenge for those studying computational argumentation as formal models of argumentation tend to assume a logical argument, whereas argumentation on the social web is often largely rhetorical. To investigate the application of logical versus rhetorical techniques on the social web, we bring together two ontologies used for modelling argumentation and online communities respectively, the Argument Interchange Format and the Semantic Interlinked Online Communities project. We augment these with our own ontology for modelling rhetorical argument, the Argumentation on the Social Web Ontology, and trial our additions by examining three case studies following argumentation on different categories of social media. Finally, we present examples of how rhetorical argumentation is used in the context of the social web and show that there are clear markers present that can allow for a rudimentary estimate for the classification of a social media post with regards to its contribution to a discussion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.