Current practices in collaborative genomic data analysis (e.g. PCAWG [1]) necessitate all involved parties to exchange individual patient data and perform all analysis locally, or use a trusted server for maintaining all data to perform analysis in a single site (e.g. the Cancer Genome Collaboratory). Since both approaches involve sharing genomic sequence data -which is typically not feasible due to privacy issues, collaborative data analysis remains to be a rarity in genomic medicine.In order to facilitate efficient and effective collaborative or remote genomic computation we introduce SkSES (Sketching algorithms for Secure Enclave based genomic data analysiS), a computational framework for performing data analysis and querying on multiple, individually encrypted genomes from several institutions in an untrusted cloud environment. Unlike other techniques for secure/privacy preserving genomic data analysis, which typically rely on sophisticated cryptographic techniques with prohibitively large computational overheads, SkSES utilizes the secure enclaves supported by current generation microprocessor architectures such as Intel's SGX. The key conceptual contribution of SkSES is its use of sketching data structures that can fit in the limited memory available in a secure enclave.While streaming/sketching algorithms have been developed for many applications in computer science, their feasibility in genomics has remained largely unexplored. On the other hand, even though privacy and security issues are becoming critical in genomic medicine, available cryptographic techniques based on, e.g. homomorphic encryption or garbled circuits, fail to address the performance demands of this rapidly growing field. The alternative offered by Intel's SGX, a combination of hardware and software solutions for secure data analysis, is severely limited by the relatively small size of a secure enclave, a private region of the memory protected from other processes. SkSES addresses this limitation through the use of sketching data structures to support efficient secure and privacy preserving SNP analysis across individually encrypted VCF files from multiple institutions. In particular SkSES provides the users the ability to query for the "k most significant SNPs" among any set of user specified SNPs and any value of k -even when the total number of SNPs to be maintained is far beyond the memory capacity of the secure enclave. Results: We tested SkSES on the complete iDASH-2017 competition data set comprised of 1000 case and 1000 control samples related to an unknown phenotype. SkSES was able to identify the top SNPs with respect to the χ 2 statistic, among any user specified subset of SNPs across this data set of 2000 individually encrypted complete human genomes quickly and accurately -demonstrating the feasibility of secure and privacy preserving computation for genomic medicine via Intel's SGX.
Genotype imputation is an essential tool in genetics research, whereby missing genotypes are inferred based on a panel of reference genomes to enhance the power of downstream analyses. Recently, public imputation servers have been developed to allow researchers to leverage increasingly large-scale and diverse genetic data repositories for imputation. However, privacy concerns associated with uploading one's genetic data to a third-party server greatly limit the utility of these services. In this paper, we introduce a practical, secure hardware-based solution for a privacy-preserving imputation service, which keeps the input genomes private from the service provider by processing the data only within a Trusted Execution Environment (TEE) offered by the Intel SGX technology. Our solution features SMac, an efficient, side-channel-resilient imputation algorithm designed for Intel SGX, which employs the hidden Markov model (HMM)-based imputation strategy also utilized by a state-of-the-art imputation software Minimac. SMac achieves imputation accuracies virtually identical to those of Minimac and provides protection against known attacks on SGX while maintaining scalability to large datasets. We additionally show the necessity of our strategies for mitigating side-channel risks by identifying vulnerabilities in existing imputation software and controlling their information exposure. Overall, our work provides a guideline for practical and secure implementation of genetic analysis tools in SGX, representing a step toward privacy-preserving analysis services that can facilitate data sharing and accelerate genetics research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.