Background
Clinical and social trials create evidence that enables medical progress. However, the gathering of personal and patient data requires high security and privacy standards. Direct linking of personal information and medical data is commonly hidden through pseudonymization. While this makes unauthorized access to personal medical data more difficult, a centralized pseudonymization list can still pose a security risk. In addition, medical data linked via pseudonyms can still be used for data-driven reidentification.
Objective
Our objective was to propose a novel approach to pseudonymization based on public-private key cryptography that allows (1) decentralized patient-driven creation and maintenance of pseudonyms, (2) 1-time pseudonymization of each data record, and (3) grouping of patient data records even without knowing the pseudonymization key.
Methods
Based on public-private key cryptography, we set up a signing mechanism for patient data records and detailed the workflows for (1) user registration, (2) user log-in, (3) record storing, and (4) record grouping. We evaluated the proposed mechanism for performance, examined the potential risks based on cryptographic collision, and carried out a threat analysis.
Results
The performance analysis showed that all workflows could be performed with an average runtime of 0.057 to 42.320 ms (user registration), 0.083 to 0.606 ms (record creation), and 0.005 to 0.198 ms (record grouping) depending on the chosen cryptographic tools. We expected no realistic risk of cryptographic collision in the proposed system, and the threat analysis revealed that 3 distinct server systems of the proposed setup had to be compromised to allow access to combined medical data and private data. However, this would still allow only for data-driven deidentification. For a full reidentification, all 3 trial servers and all study participants would have to be compromised. In addition, the approach supports consent management, automatically anonymizes the data after trial closure, and provides basic mechanisms against data forging.
Conclusions
The proposed approach has a high security and privacy level in comparison with traditional centralized pseudonymization approaches and does not require a trusted third party. The only drawback in comparison with central pseudonymization is the directed feedback of accidental findings to individual participants, as this is not possible with a quasi-anonymous storage of patient data.