We consider recombinant inbred lines obtained by crossing two given homozygous parents and then applying multiple generations of self-crossings or full-sib matings. The chromosomal content of any such line forms a mosaic of blocks, each alternatively inherited identically by descent from one of the parents. Quantifying the statistical properties of such mosaic genomes has remained an open challenge for many years. Here, we solve this problem by taking a continuous chromosome picture and assuming crossovers to be noninterfering. Using a continuous-time random walk framework and Markov chain theory, we determine the statistical properties of these identical-by-descent blocks. We find that successive block lengths are only very slightly correlated. Furthermore, the blocks on the ends of chromosomes are larger on average than the others, a feature understandable from the nonexponential distribution of block lengths.
WITH the advent of dense genomic maps, in particular based on single-nucleotide polymorphism (SNP) data, the study of haplotypes has become central for modern analyses in population genetics (Buckler and Gore 2007;Carlton 2007;Frazer et al. 2007;Mott 2007;Jakobsson et al. 2008;Bryc et al. 2010). Here, the term haplotype refers to the series of alleles that an individual carries on a chromosome pair at a collection of (possibly many) loci and contrasts with single-locus genotypes that were the objects of many past studies. Haplotypic information can be used for association studies (Gold et al. 2008), for diversity studies (Lindblad-Toh et al. 2005), or for recognizing signals of positive selection using various measures of haplotype homozygosity (Sabeti et al. 2002;Zhang et al. 2006;Lencz et al. 2007;Tang et al. 2007;Curtis et al. 2008). Many approaches capitalize on the apparent "block" structure of haplotypes (Stumpf 2002;Cardon and Abecasis 2003;Wall and Pritchard 2003;Altshuler et al. 2005;Zheng and McPeek 2007).Various causes can be called upon to explain the apparent structuration of genomes in haplotype blocks (Tishkoff and Verrelli 2003;Zondervan and Cardon 2004;Pe'er et al. 2006), among which are recombination hotspots (Goldstein 2001;Jeffreys et al. 2001) and population structure (Pritchard et al. 2000;Grote 2007;Slate and Pemberton 2007). However, the situation is often complicated (Shifman et al. 2003;Yalcin et al. 2004;Cuppen 2005;Kauppi et al. 2005;Greenawalt et al. 2006;Moore et al. 2008). In particular, the theoretical properties of many of the objects mentioned above, e.g., haplotype block lengths, remain largely unknown. Often, the distribution of blocks is declared "nonrandom" (Curtis et al. 2008) although the null hypothesis is not clearly specified.The task of determining statistical properties of chromosomal block structures has arisen in many different contexts. These can be classified into two types according to the kind of populations considered and lead to different mathematical techniques. In the first class, one asks how the genome of one or more parents in a populatio...