Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.
In this paper, we propose privacy-enhancing technologies for medical tests and personalized medicine methods that use patients' genomic data. Focusing on genetic diseasesusceptibility tests, we develop a new architecture (between the patient and the medical unit) and propose a "privacypreserving disease susceptibility test" (PDS) by using homomorphic encryption and proxy re-encryption. Assuming the whole genome sequencing to be done by a certified institution, we propose to store patients' genomic data encrypted by their public keys at a "storage and processing unit" (SPU). Our proposed solution lets the medical unit retrieve the encrypted genomic data from the SPU and process it for medical tests and personalized medicine methods, while preserving the privacy of patients' genomic data. We also quantify the genomic privacy of a patient (from the medical unit's point of view) and show how a patient's genomic privacy decreases with the genetic tests he undergoes due to (i) the nature of the genetic test, and (ii) the characteristics of the genomic data. Furthermore, we show how basic policies and obfuscation methods help to keep the genomic privacy of a patient at a high level. We also implement and show, via a complexity analysis, the practicality of PDS.
Genomics data introduce a substantial computational burden as well as data privacy and ownership issues. Data sets generated by high-throughput sequencing platforms require immense amounts of computational resources to align to reference genomes and to call and annotate genomic variants. This problem is even more pronounced if reanalysis is needed for new versions of reference genomes, which may impose high loads to existing computational infrastructures. Additionally, after the compute-intensive analyses are completed, the results are either kept in centralized repositories with access control, or distributed among stakeholders using standard file transfer protocols. This imposes two main problems: (1) Centralized servers become gatekeepers of the data, essentially acting as an unnecessary mediator between the actual data owners and data users; and (2) servers may create single points of failure both in terms of service availability and data privacy. Therefore, there is a need for secure and decentralized platforms for data distribution with user-level data governance. A new technology, blockchain, may help ameliorate some of these problems. In broad terms, the blockchain technology enables decentralized, immutable, incorruptible public ledgers. In this Perspective, we aim to introduce current developments toward using blockchain to address several problems in omics, and to provide an outlook of possible future implications of the blockchain technology to life sciences.
Abstract. Geneticists prefer to store patients' aligned, raw genomic data, in addition to their variant calls (compact and summarized form of the raw data), mainly because of the immaturity of bioinformatic algorithms and sequencing platforms. Thus, we propose a privacy-preserving system to protect the privacy of aligned, raw genomic data. The raw genomic data of a patient includes millions of short reads, each comprised of between 100 and 400 nucleotides (genomic letters). We propose storing these short reads at a biobank in encrypted form. The proposed scheme enables a medical unit (e.g., a pharmaceutical company or a hospital) to privately retrieve a subset of the short reads of the patients (which include a definite range of nucleotides depending on the type of the genetic test) without revealing the nature of the genetic test to the biobank. Furthermore, the proposed scheme lets the biobank mask particular parts of the retrieved short reads if (i) some parts of the provided short reads are out of the requested range, or (ii) the patient does not give consent to some parts of the provided short reads (e.g., parts revealing sensitive diseases). We evaluate the proposed scheme to show the amount of unauthorized genomic data leakage it prevents. Finally, we implement the proposed scheme and assess its practicality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.