2018
DOI: 10.1101/350231
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Re-identification of genomic data using long range familial searches

Abstract: Consumer genomics databases reached the scale of millions of individuals. Recently, law enforcement investigators have started to exploit some of these databases to find distant familial relatives, which can lead to a complete re-identification. Here, we leveraged genomic data of 600,000 individuals tested with consumer genomics to investigate the power of such long-range familial searches. We project that half of the searches with European-descent individuals will result with a third cousin or closer match an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
7
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 21 publications
1
7
0
Order By: Relevance
“…In one study, participants ranked reidentification as a top risk of data sharing (Oliver et al, 2012). In the present study, most PIs also reported concerns about the identifiability of certain data, reflecting literature showing this possibility (Erlich et al, 2018; Erlich & Narayanan, 2014; Shringarpure & Bustamante, 2015). However, most RCs did not believe that re-identification was a risk for participants or believed that the risk was similar to the risks of sharing other types of digital data.…”
Section: Discussionsupporting
confidence: 71%
See 2 more Smart Citations
“…In one study, participants ranked reidentification as a top risk of data sharing (Oliver et al, 2012). In the present study, most PIs also reported concerns about the identifiability of certain data, reflecting literature showing this possibility (Erlich et al, 2018; Erlich & Narayanan, 2014; Shringarpure & Bustamante, 2015). However, most RCs did not believe that re-identification was a risk for participants or believed that the risk was similar to the risks of sharing other types of digital data.…”
Section: Discussionsupporting
confidence: 71%
“…However, most RCs did not believe that re-identification was a risk for participants or believed that the risk was similar to the risks of sharing other types of digital data. This finding suggests a discordance between RC perceptions of risks associated with data sharing and those risks that have been published in the literature, particularly regarding the potential for participant re-identification (Erlich et al, 2018;Erlich & Narayanan, 2014;Shringarpure & Bustamante, 2015). This difference between the attitudes of PIs and RCs, and between RC attitudes and identified potential risks, could have a major effect on the consent process, as RCs are often directly involved in explaining and answering questions about risks and benefits of data sharing with study participants, while PIs are ultimately responsible for compliance with funder requirements.…”
Section: Attitudes Toward Participant Privacymentioning
confidence: 86%
See 1 more Smart Citation
“…In fact, within the frame of this interpretation of the GDPR, the individual may not have even the option to “opt-out” (24). In fact, the exploitation of genealogy databases, or more broadly consumer genomics databases, allows to identify up to 60%, and soon nearly any US-individual of European- descent in the near future, using demographic identifiers, including research participants of public sequencing projects (25). These approaches have been recently successfully used by law enforcement agencies to identify criminals, posing significant ethical and legal challenges (26).…”
Section: Ethical Challenges the Gdpr And Beyondmentioning
confidence: 99%
“…Because of recombination and mutation events, inherited, identical sequences of DNA (haplotypes) become shorter with increasing numbers of generations. These identical‐by‐decent segments can be used to infer BGA and this is the method used by most commercial genealogy service providers (Erlich, Shor, Carmi, & Pe'er, ; Erlich, Shor, Pe'er, & Carmi, ). The forensic community has instead focused on shorter genetic sequences, most often preferring the shortest possible markers of all: SNPs.…”
Section: Biogeographical Ancestrymentioning
confidence: 99%