The increasing availability and use of sensitive personal data raises a set of issues regarding the privacy of the individuals behind the data. These concerns become even more important when health data are processed, as are considered sensitive (according to most global regulations). PETs attempt to protect the privacy of individuals whilst preserving the utility of data. One of the most popular technologies recently is DP, which was used for the 2020 U.S. Census. Another trend is to combine synthetic data generators with DP to create so-called private synthetic data generators. The objective is to preserve statistical properties as accurately as possible, while the generated data should be as different as possible compared to the original data regarding private features. While these technologies seem promising, there is a gap between academic research on DP and synthetic data and the practical application and evaluation of these techniques for real-world use cases. In this paper, we evaluate three different private synthetic data generators (MWEM, DP-CTGAN, and PATE-CTGAN) on their use-case-specific privacy and utility. For the use case, continuous heart rate measurements from different individuals are analyzed. This work shows that private synthetic data generators have tremendous advantages over traditional techniques, but also require in-depth analysis depending on the use case. Furthermore, it can be seen that each technology has different strengths, so there is no clear winner. However, DP-CTGAN often performs slightly better than the other technologies, so it can be recommended for a continuous medical data use case.
Cancer registries offer a systematic approach for the collection, storage, and management of data on persons with cancer and related diseases. Much hope in research and healthcare in general is depending on such register-based analyses in order to comprehensively consider the features of a highly diverse population. Next to the data collection the cancer registries are responsible for data protection. To fulfill legal regulations, access to data has to be controlled in a strict way leading to sometimes bureaucratic and slow processes. The situation is especially complicated in Germany, since cancer data is distributed over numerous federal cancer registries. A research team has to negotiate a separate contract with each cancer registry, if a nationwide data evaluation has to be performed.In a joint effort of cancer registries, technical, medical, and economical experts we propose a different solution for cooperative data processing. Our approach aims for combining data in a virtual pool based on the selection criteria of individual requests from researchers. To achieve our goal, we adapt the Fraunhofer Medical Data Space as enabling technology. The architecture we propose will allow us to pool data of multiple partners regulated by data access policies. In doing so, each of the data sources can introduce its own rules and specifications on how data is used. Additionally, we add a digital consent management that will allow individual patients to decide how their data is used. Finally, we show the high potential of the cooperative analysis of distributed cancer data supported by the proposed solution in our approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.