Identification of genomic loci and segments that are identical by descent (IBD) allows inference on problems such as relatedness detection, IBD disease mapping, heritability estimation and detection of recent or ongoing positive selection. Here, employing a novel statistical method, we use IBD to find signals of selection in the Maasai from Kinyawa, Kenya (MKK). In doing so, we demonstrate the advantage of statistical tools that can probabilistically estimate IBD sharing without having to thin genotype data because of linkage disequilibrium (LD), and that allow for both inbreeding and more than one allele to be shared IBD. We use our novel method, GIBDLD, to estimate IBD sharing between all pairs of individuals at all genotyped SNPs in the MKK, and, by looking for genomic regions showing excess IBD sharing in unrelated pairs, find loci that are known to have undergone recent selection (eg, the LCT gene and the HLA region) as well as many novel loci. Intriguingly, those loci that show the highest amount of excess IBD, with the exception of HLA, also show a substantial number of unrelated pairs sharing all four of their alleles IBD. In contrast to other IBD detection methods, GIBDLD provides accurate probabilistic estimates at each locus for all nine possible IBD sharing states between a pair of individuals, thus allowing for consanguinity, while also modeling LD, thus removing the need to thin SNPs. These characteristics will prove valuable for those doing genetic studies, and estimating IBD, in the wide variety of human populations. European Journal of Human Genetics (2013) 21, 205-211; doi:10.1038/ejhg.2012.148; published online 11 July 2012Keywords: identity by descent; natural selection; consanguinity; cryptic relatedness; relationship inference; SNPs
INTRODUCTIONThe ability to discover recent positive selection in the human genome is one compelling reason to estimate identity by descent (IBD) in a cohort of largely unrelated individuals. In particular, IBD can be used to find selection on standing variation, a situation where many methods used for detecting selection may not perform well. 1 Additionally, other genetic questions can be well addressed by estimating IBD within a set of individuals. These include, detection of unknown or mistaken relationships, 2-6 estimation of heritability and genomic partitioning of genetic variance, 7,8 and mapping by the identification of shared segments. 9-13 IBD, however, is not directly observed but must be inferred from the available data. Traditionally, the combination of a pedigree with genotype data enabled the efficient computation of IBD using either 'peeling' 14 or hidden Markov models (HMMs). [15][16][17] More recently, however, the large amounts of information made available from high density SNP genotyping arrays has enabled estimation of IBD even for very distantly related pairs of individuals (ie, 410 generations) in the absence of pedigree information. This additional data, however, also presents the difficulty of accommodating the linkage disequilibrium (LD...