Cancer registries offer a systematic approach for the collection, storage, and management of data on persons with cancer and related diseases. Much hope in research and healthcare in general is depending on such register-based analyses in order to comprehensively consider the features of a highly diverse population. Next to the data collection the cancer registries are responsible for data protection. To fulfill legal regulations, access to data has to be controlled in a strict way leading to sometimes bureaucratic and slow processes. The situation is especially complicated in Germany, since cancer data is distributed over numerous federal cancer registries. A research team has to negotiate a separate contract with each cancer registry, if a nationwide data evaluation has to be performed.In a joint effort of cancer registries, technical, medical, and economical experts we propose a different solution for cooperative data processing. Our approach aims for combining data in a virtual pool based on the selection criteria of individual requests from researchers. To achieve our goal, we adapt the Fraunhofer Medical Data Space as enabling technology. The architecture we propose will allow us to pool data of multiple partners regulated by data access policies. In doing so, each of the data sources can introduce its own rules and specifications on how data is used. Additionally, we add a digital consent management that will allow individual patients to decide how their data is used. Finally, we show the high potential of the cooperative analysis of distributed cancer data supported by the proposed solution in our approach.
Data of the Association of Statutory Health Physicians in Westfalen-Lippe (KVWL) are used to enumerate the cohort of women in WL who are entitled to MSP participation and their use of curative mammography outside of the MSP. The EKR-NRW provides epidemiological and medical data on all BC cases in WL, on cohort mortality, and on causes of death. The central MSP database MaSc offers the screening history of all MSP participants. The established uniform encryption methods employed in the EKR-NRW are used for linking records from the three data sources in one data-merging center (DZS). To this end, data are first captured in standardized formats, variably aggregated and transferred in an encrypted format, checked for anonymity and diversity level in an encrypted form, and eventually stored in a factually anonymized manner in the evaluation center (ES). Researchers can obtain data sets with plain text epidemiological-medical data from the ES for analyses.
Background The evaluation of population-based screening programs, like the German Mammography Screening Program (MSP), requires collection and linking data from population-based cancer registries and other sources of the healthcare system on a case- specific level. To link such sensitive data, we developed a method that is compliant with German data protection regulations and does not require written individual consent. Methods Our method combines a probabilistic record linkage on encrypted identifying data with ‘blinded anonymisation’. It ensures that all data either are encrypted or have a defined and measurable degree of anonymity. The data sources use a software to transform plain-text identifying data into a set of irreversibly encrypted person cryptograms, while the evaluation attributes are aggregated in multiple stages and are reversibly encrypted. A pseudonymisation service encrypts the person cryptograms into record assignment numbers and a downstream data-collecting centre uses them to perform the probabilistic record linkage. The blinded anonymisation solves the problem of quasi-identifiers within the evaluation data. It allows selecting a specific set of the encrypted aggregations to produce data export with ensured k-anonymity, without any plain-text information. These data are finally transferred to an evaluation centre where they are decrypted and analysed. Our approach allows creating several such generalisations, with different resulting suppression rates allowing dynamic balance information depth with privacy protection and also highlights how this affects data analysability. Results German data protection authorities approved our concept for the evaluation of the impact of the German MSP on breast cancer mortality. We implemented a prototype and tested it with 1.5 million simulated records, containing realistically distributed identifying data, calculated different generalisations and the respective suppression rates. Here, we also discuss limitations for large data sets in the cancer registry domain, as well as approaches for further improvements like l-diversity and how to reduce the amount of manual post-processing. Conclusion Our approach enables secure linking of data from population-based cancer registries and other sources of the healthcare system. Despite some limitations, it enables evaluation of the German MSP program and can be generalised to be applicable to other projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.