Recent discussions in many scientific disciplines stress the necessity of “FAIR” data. FAIR data, however, does not necessarily include information on data trustworthiness, where trustworthiness comprises reliability, validity and provenience/provenance. This opens up the risk of misinterpreting scientific data, even though all criteria of “FAIR” are fulfilled. Especially applications such as secondary data processing, data blending, and joint interpretation or visualization efforts are affected. This paper intends to start a discussion in the scientific community about how to evaluate, describe, and implement trustworthiness in a standardized data evaluation approach and in its metadata description following the FAIR principles. It discusses exemplarily different assessment tools regarding soil moisture measurements, data processing and visualization and elaborates on which additional (metadata) information is required to increase the trustworthiness of data for secondary usage. Taking into account the perspectives of data collectors, providers and users, the authors identify three aspects of data trustworthiness that promote efficient data sharing: 1) trustworthiness of the measurement 2) trustworthiness of the data processing and 3) trustworthiness of the data integration and visualization. The paper should be seen as the basis for a community discussion on data trustworthiness for a scientifically correct secondary use of the data. We do not have the intention to replace existing procedures and do not claim completeness of reliable tools and approaches described. Our intention is to discuss several important aspects to assess data trustworthiness based on the data life cycle of soil moisture data as an example.
<p>A sensor network surrounds the island of Helgoland, supplying marine data centers with autonomous measurements of variables such as temperature, salinity, chlorophyll and oxygen saturation. The output is a data collection containing information about the complicated conditions around Helgoland, lying at the edge between coastal area and open sea. Spatio-temporal phenomena, such as passing river plumes and pollutant influx through flood events can be found in this data set. Through the data provided by the existing measurement network, these events can be detected and investigated.</p><p>&#160;Because of its important role in understanding the transition between coastal and sea conditions, plans are made to augment the sensor network around Helgoland with another underwater sensor station, an Underwater Node (UWN). The new node is supposed to optimally complement the existing sensor network. Therefore, it makes sense to place it in an area that is not yet represented well by other sensors. The exact spatial and temporal extent of the area of representativity around a sensor is hard to determine, but is assumed to have similar statistical conditions as the sensor measures. This is difficult to specify in the complex system around Helgoland and might change with both, space and time.</p><p>Using an unsupervised machine learning approach, I determine areas of representativity around Helgoland with the goal of finding an ideal placement for a new sensor node. The areas of representativity are identified by clustering a dataset containing time series of the existing sensor network and complementary model data for a period of several years. The computed areas of representativity are compared to the existing sensor placements to decide where to deploy the additional UWN to achieve a good coverage for further investigations on spatio-temporal phenomena.</p><p>A challenge that occurs during the clustering analysis is to determine whether the spatial areas of representativity remain stable enough over time to base the decision of long-term sensor placement on its results. I compare results across different periods of time and investigate how fast areas of representativity change spatially with time and if there are areas that remain stable over the course of several years. This also allows insights on the occurrence and behavior of spatio-temporal events around Helgoland in the long-term.&#160;&#160;&#160;&#160;</p><p>Whether spatial areas of representativity remain stable enough temporally to be taken into account for augmenting sensor networks, influences future network design decisions. This way, the extended sensor network can capture a greater variety of the spatio-temporal phenomena around Helgoland, as well as allow an overview on the long-term behavior of the marine system.</p>
<p>A common challenge for projects with multiple involved research institutes is a well-defined and productive collaboration. All parties measure and analyze different aspects, depend on each other, share common methods, and exchange the latest results, findings, and data. Today this exchange is often impeded by a lack of ready access to shared computing and storage resources. In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources.</p><p>We present the prototype of an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data. It offers the functionality to request remote data through a comfortable interface, and to share and invoke single computational methods or even entire analytical workflows and their results. The prototype enables researchers to make their methods accessible as a backend module running on their own <span>infrastructure</span>. Hence researchers from other institutes may apply the available methods through a lightweight python or Javascript API. This API transforms standard python calls into requests to the backend process on the remote server. In the end, the overhead for both, the backend developer and the remote user, is very low. The effort of implementing the necessary workflow and API usage equalizes the writing of code in a non-distributed setup. Besides that, data do not have to be downloaded locally, the analysis can be executed &#8220;close to the data&#8221; while using the institutional infrastructure where the eligible data set is stored.</p><p>With our prototype, we demonstrate distributed data access and analysis workflows across institutional borders to enable effective scientific collaboration, thus deepening our understanding of the Earth system.</p><p>This framework has been developed in a joint effort of the DataHub and Digitial Earth initiatives within the Research Centers of the Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.&#160; (Helmholtz Association of German Research Centres, HGF).</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.