Genomic data poses serious interdependent risks: your data might also leak information about your family members' data. Methods attackers use to infer genomic information, as well as recent proposals for enhancing genomic privacy, are discussed.
Individuals desiring to control their personal data face significant interdependent privacy risks-risks that involve the leakage of one's personal data due to data shared by other individuals. With recent advances in whole genome sequencing, genomic data in particular poses serious interdependent privacy risks.Genomic data has many unique characteristics: it is highly valuable, is an individual's distinctive fingerprint, rarely changes throughout an individual's lifetime, is nonrevocable, and includes sensitive information about an individual (such as disease status or physical characteristics). 1,2 But, the main reason genomic data poses interdependent privacy risks is that it's correlated within family members. Thus, one person's genome-related data (for instance, raw genome, variant call format file, genomic test results, or aggregate statistics) might leak information about the genome-related data of his or her family members.This issue goes all the way back to the DNA dragnets that first raised serious concerns among privacy advocates. Here, we present recent developments on the information security front, including ■ how attackers can infer an individual's genomic data from the partial genomes of his or her family members, background knowledge about genomics (simple statistics, high-order correlations, and so on), and the individual's phenotypic information; ■ how attackers can determine an individual's membership in a particular genomic dataset (for example, a beacon) from only the results of basic queries to that dataset and partial genomic knowledge about the individual's family members; ■ how attackers can deanonymize the deidentified genomes in a public dataset by using the kinship information; and ■ how attackers can efficiently infer kinship from public anonymous genomic databases.