With the increasing affordability and availability of patient data, hospitals tend to outsource their data to cloud service providers (CSPs) for the purpose of storage and analytics. However, the concern of data privacy significantly limits the data owners' choice. In this work, we propose the first solution, to the best of our knowledge, that allows a CSP to perform efficient identification of target patients (e.g., pre-processing for a genome-wide association study -GWAS) over multi-tenant encrypted phenotype data (owned by multiple hospitals or data owners). We first propose an encryption mechanism for phenotype data, where each data owner is allowed to encrypt its data with a unique secret key. Moreover, the ciphertext supports privacy-preserving search and, consequently, enables the selection of the target group of patients (e.g., case and control groups). In addition, we provide a per-query based authorization mechanism for a client to access and operate on the data stored at the CSP. Based on the identified patients, the proposed scheme can either (i) directly conduct GWAS (i.e., computation of statistics about genomic variants) at the CSP or (ii) provide the identified groups to the client to directly query the corresponding data owners and conduct GWAS using existing distributed solutions. We implement the proposed scheme and run experiments over a real-life genomic dataset to show its effectiveness. The result shows that the proposed solution is capable to efficiently identify the case/control groups in a privacy-preserving way.
CCS CONCEPTS• Security and privacy → Management and querying of encrypted data.