Felix Nikolaus Wirth scite author profile

Background The novel coronavirus SARS-CoV-2 rapidly spread around the world, causing the disease COVID-19. To contain the virus, much hope is placed on participatory surveillance using mobile apps, such as automated digital contact tracing, but broad adoption is an important prerequisite for associated interventions to be effective. Data protection aspects are a critical factor for adoption, and privacy risks of solutions developed often need to be balanced against their functionalities. This is reflected by an intensive discussion in the public and the scientific community about privacy-preserving approaches. Objective Our aim is to inform the current discussions and to support the development of solutions providing an optimal balance between privacy protection and pandemic control. To this end, we present a systematic analysis of existing literature on citizen-centered surveillance solutions collecting individual-level spatial data. Our main hypothesis is that there are dependencies between the following dimensions: the use cases supported, the technology used to collect spatial data, the specific diseases focused on, and data protection measures implemented. Methods We searched PubMed and IEEE Xplore with a search string combining terms from the area of infectious disease management with terms describing spatial surveillance technologies to identify studies published between 2010 and 2020. After a two-step eligibility assessment process, 27 articles were selected for the final analysis. We collected data on the four dimensions described as well as metadata, which we then analyzed by calculating univariate and bivariate frequency distributions. Results We identified four different use cases, which focused on individual surveillance and public health (most common: digital contact tracing). We found that the solutions described were highly specialized, with 89% (24/27) of the articles covering one use case only. Moreover, we identified eight different technologies used for collecting spatial data (most common: GPS receivers) and five different diseases covered (most common: COVID-19). Finally, we also identified six different data protection measures (most common: pseudonymization). As hypothesized, we identified relationships between the dimensions. We found that for highly infectious diseases such as COVID-19 the most common use case was contact tracing, typically based on Bluetooth technology. For managing vector-borne diseases, use cases require absolute positions, which are typically measured using GPS. Absolute spatial locations are also important for further use cases relevant to the management of other infectious diseases. Conclusions We see a large potential for future solutions supporting multiple use cases by combining different technologies (eg, Bluetooth and GPS). For this to be successful, however, adequate privacy-protection measures must be implemented. Technologies currently used in this context can probably not offer enough protection. We, therefore, recommend that future solutions should consider the use of modern privacy-enhancing techniques (eg, from the area of secure multiparty computing and differential privacy).

show abstract

Privacy-preserving data sharing infrastructures for medical research: systematization and comparison

Wirth

Meurers

Johns

et al. 2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Data sharing is considered a crucial part of modern medical research. Unfortunately, despite its advantages, it often faces obstacles, especially data privacy challenges. As a result, various approaches and infrastructures have been developed that aim to ensure that patients and research participants remain anonymous when data is shared. However, privacy protection typically comes at a cost, e.g. restrictions regarding the types of analyses that can be performed on shared data. What is lacking is a systematization making the trade-offs taken by different approaches transparent. The aim of the work described in this paper was to develop a systematization for the degree of privacy protection provided and the trade-offs taken by different data sharing methods. Based on this contribution, we categorized popular data sharing approaches and identified research gaps by analyzing combinations of promising properties and features that are not yet supported by existing approaches. Methods The systematization consists of different axes. Three axes relate to privacy protection aspects and were adopted from the popular Five Safes Framework: (1) safe data, addressing privacy at the input level, (2) safe settings, addressing privacy during shared processing, and (3) safe outputs, addressing privacy protection of analysis results. Three additional axes address the usefulness of approaches: (4) support for de-duplication, to enable the reconciliation of data belonging to the same individuals, (5) flexibility, to be able to adapt to different data analysis requirements, and (6) scalability, to maintain performance with increasing complexity of shared data or common analysis processes. Results Using the systematization, we identified three different categories of approaches: distributed data analyses, which exchange anonymous aggregated data, secure multi-party computation protocols, which exchange encrypted data, and data enclaves, which store pooled individual-level data in secure environments for access for analysis purposes. We identified important research gaps, including a lack of approaches enabling the de-duplication of horizontally distributed data or providing a high degree of flexibility. Conclusions There are fundamental differences between different data sharing approaches and several gaps in their functionality that may be interesting to investigate in future work. Our systematization can make the properties of privacy-preserving data sharing infrastructures more transparent and support decision makers and regulatory authorities with a better understanding of the trade-offs taken.

show abstract

EasySMPC: a simple but powerful no-code tool for practical secure multiparty computation

et al. 2022

View full text Add to dashboard Cite

Background Modern biomedical research is data-driven and relies heavily on the re-use and sharing of data. Biomedical data, however, is subject to strict data protection requirements. Due to the complexity of the data required and the scale of data use, obtaining informed consent is often infeasible. Other methods, such as anonymization or federation, in turn have their own limitations. Secure multi-party computation (SMPC) is a cryptographic technology for distributed calculations, which brings formally provable security and privacy guarantees and can be used to implement a wide-range of analytical approaches. As a relatively new technology, SMPC is still rarely used in real-world biomedical data sharing activities due to several barriers, including its technical complexity and lack of usability. Results To overcome these barriers, we have developed the tool EasySMPC, which is implemented in Java as a cross-platform, stand-alone desktop application provided as open-source software. The tool makes use of the SMPC method Arithmetic Secret Sharing, which allows to securely sum up pre-defined sets of variables among different parties in two rounds of communication (input sharing and output reconstruction) and integrates this method into a graphical user interface. No additional software services need to be set up or configured, as EasySMPC uses the most widespread digital communication channel available: e-mails. No cryptographic keys need to be exchanged between the parties and e-mails are exchanged automatically by the software. To demonstrate the practicability of our solution, we evaluated its performance in a wide range of data sharing scenarios. The results of our evaluation show that our approach is scalable (summing up 10,000 variables between 20 parties takes less than 300 s) and that the number of participants is the essential factor. Conclusions We have developed an easy-to-use “no-code solution” for performing secure joint calculations on biomedical data using SMPC protocols, which is suitable for use by scientists without IT expertise and which has no special infrastructure requirements. We believe that innovative approaches to data sharing with SMPC are needed to foster the translation of complex protocols into practice.

show abstract

A Comprehensive Portal for Clinical and Translational Data Warehouses

Johns

Müller

Wirth

et al. 2021

View full text Add to dashboard Cite

Data-driven methods in biomedical research can help to obtain new insights into the development, progression and therapy of diseases. Clinical and translational data warehouses such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART are important solutions for this. From the well-known FAIR data principles, which are used to address the aspects of findability, accessibility, interoperability and reusability. In this paper, we focus on findability. For this purpose, we describe a portal solution that acts as a catalogue for a wide range of data warehouse instances, featuring a central access point and links to training material, such as user manuals and video tutorials. Moreover, the portal provides an overview of the status of multiple warehouses for developers and a set of statistics about the data currently loaded. Due to its modular design and the use of modern web technologies, the portal is easy to extend and customize to reflect different corporate designs and institutional requirements.

show abstract

Data Provenance in Biomedical Research: Scoping Review

Johns¹,

Meurers²,

Wirth³

et al. 2023

J Med Internet Res

View full text Add to dashboard Cite

Background Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research. Objective The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption. Methods Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures. Results We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV. Conclusions The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.