7Direct-to-consumer (DTC) genetics services are increasingly popular for genetic genealogy, with 8 tens of millions of customers as of 2019. Several DTC genealogy services allow users to upload 9 their own genetic datasets in order to search for genetic relatives. A user and a target person in 10 the database are identified as genetic relatives if the user's uploaded genome shares one or more 11 sufficiently long segments in common with that of the target person-that is, if the two genomes 12 share one or more long regions identical by state (IBS). IBS matches reveal some information 13 about the genotypes of the target person, particularly if the chromosomal locations of IBS matches 14 are shared with the uploader. Here, we describe several methods by which an adversary who 15 wants to learn the genotypes of people in the database can do so by uploading multiple datasets.
16Depending on the methods used for IBS matching and the information about IBS segments 17 returned to the user, substantial information about users' genotypes can be revealed with a few 18 hundred uploaded datasets. For example, using a method we call IBS tiling, we estimate that an 19 adversary who uploads approximately 900 publicly available genomes could recover at least one 20 allele at SNP sites across up to 82% of the genome of a median person of European ancestries.
21In databases that detect IBS segments using unphased genotypes, approximately 100 uploads of 22 falsified datasets can reveal enough genetic information to allow accurate genome-wide imputation 23 of every person in the database. We provide simple-to-implement suggestions that will prevent the 24 exploits we describe and discuss our results in light of recent trends in genetic privacy, including 25 the recent use of uploads to DTC genetic genealogy services by law enforcement.
26(Regalado, 2019). One of the major applications of DTC genetics has been genetic genealogy.
31Customers of companies such as 23andMe and Ancestry, once they are genotyped, can view a list 32 of other customers who are likely to be genetic relatives. These putative relatives' full names are 33 often given, and sometimes contact details are given as well. Such genealogical matching services 34 are of interest to people who want to find distant genetic relatives to extend their family tree, and 35 can be particularly useful to people who otherwise may not have information about their genetic 36 relatives, such as adoptees or the biological children of sperm donors. Several genetic genealogy 37 services-including GEDmatch, MyHeritage, FamilyTreeDNA, and LivingDNA (Table 1)-allow 38 users to upload their own genetic data if they have been genotyped by another company. These 39 entities generally offer some subset of their services at no charge to uploaders, which helps to 40 grow their databases. Upload services have also been used by law enforcement, with the goal of 41 identifying relatives of the source of a crime-scene sample (Erlich et al., 2018; Edge and Coop, 42 2019), prompting discussion abo...