Sharing scientific data with the objective of making it discoverable, accessible, reusable, and interoperable requires work and presents challenges being faced at the disciplinary level to define in particular how the data should be formatted and described. This paper represents the Proceedings of a session held at SciDataCon 2016 (Denver, 12-13 September 2016). It explores the way a range of disciplines, namely materials science, crystallography, astronomy, earth sciences, humanities and linguistics, get organized at the international level to address those challenges. The disciplinary culture with respect to data sharing, science drivers, organization, lessons learnt and the elements of the data infrastructure which are or could be shared with others are briefly described. Commonalities and differences are assessed. Common key elements for success are identified: data sharing should be science driven; defining the disciplinary part of the interdisciplinary standards is mandatory but challenging; sharing of applications should accompany data sharing. Incentives such as journal and funding agency requirements are also similar. For all, social aspects are more challenging than technological ones. Governance is more diverse, often specific to the discipline organization. Being problem-driven is also a key factor of success for building bridges to enable interdisciplinary research. Several international data organizations such as CODATA, RDA and WDS can facilitate the establishment of disciplinary interoperability frameworks. As a spin-off of the session, a RDA Disciplinary Interoperability Interest Group is proposed to bring together representatives across disciplines to better organize and drive the discussion for prioritizing, harmonizing and efficiently articulating disciplinary needs.
XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.
The wide variety of scientific user communities work with data since many years and thus have already a wide variety of data infrastructures in production today. The aim of this paper is thus not to create one new general data architecture that would fail to be adopted by each and any individual user community. Instead this contribution aims to design a reference model with abstract entities that is able to federate existing concrete infrastructures under one umbrella. A reference model is an abstract framework for understanding significant entities and relationships between them and thus helps to understand existing data infrastructures when comparing them in terms of functionality, services, and boundary conditions. A derived architecture from such a reference model then can be used to create a federated architecture that builds on the existing infrastructures that could align to a major common vision. This common vision is named as 'ScienceTube' as part of this contribution that determines the high-level goal that the reference model aims to support. This paper will describe how a well-focused use case around data replication and its related activities in the EUDAT project aim to provide a first step towards this vision. Concrete stakeholder requirements arising from scientific end users such as those of the European Strategy Forum on Research Infrastructure (ESFRI) projects underpin this contribution with clear evidence that the EUDAT activities are bottom-up thus providing real solutions towards the so often only described 'high-level big data challenges'. The followed federated approach taking advantage of community and data centers (with large computational resources) further describes how data replication services enable data-intensive computing of terabytes or even petabytes of data emerging from ESFRI projects.
Automated processing of large amounts of scientific data, especially across domains, requires that the data can be selected and parsed without human intervention. Precise characterization of that data, as in typing, is needed once the processing goes beyond the realm of domain specific or local research group assumptions. The Research Data Alliance (RDA) Data Type Registries Working Group (DTR-WG) was assembled to address this issue through the creation of a Data Type Registry methodology, data model, and prototype. The WG was approved by the RDA Council during March of 2013 and will complete its work in mid-2014, in between the third and fourth RDA Plenaries. Problem StatementAutomated selection and processing of large amounts of scientific data, especially across domains when domain specific tools cannot be used, requires that the data can be parsed and interpreted without human intervention. Within a given domain that functionality can simply be built into the software, e.g., the piece of information that appears in this location is always a temperature reading in centigrade or, at a different level of granularity, this data set is structured according to Domain Standard A including base types X, Y, and Z where the base types are things like temperature readings in centigrade. This knowledge, easily available within a given domain or a set of closely related research groups, can be built into processing workflows. But outside of that domain or environment the approach of implicit knowledge available in domain specific tools can begin to fail and more precision in associating data with the information needed to process it is required. This also applies across time as well as domains. What is well known today may be less well-known twenty years hence but age will not necessarily reduce the value of a data set and indeed may increase it. Description and Use of TypesWe are using the term 'type' here as the characterization of data structure at multiple levels of granularity, from individual data points up to and including large data sets. Optimizing the interactions among all of the producers and consumers of digital data requires that those types be defined and permanently associated with D-Lib Magazine
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.