Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30-and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used singlemolecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups.shotgun DNA sequencing | haplotype variation | gene copy number | transposable elements | maize genome U ltimately all traits are manifested in the gene content of the genome. Despite the complex nature of the regulation of mRNA synthesis and turnover, cDNA sequencing had been used as an economic approach to determine the gene content of the human genome (1). However, to achieve a better overview of all genes and their chromosomal organization, whole-genome sequencing of the human genome became essential (2). Still, the identification of all genes and their order in chromosomes of eukaryotic species has been hampered by the presence of repetitive DNA in large-size genomes. The portion of repetitive DNA in genomes is not only composed of transposable elements but also of gene families, which can vary in copy number even within the same species, generating haplotypes with changes in gene expression (3,4).It also has become clear that the distribution of genes and transposable elements is intermixed and one cannot sequence one or the other separately; they are contiguous in nature. This has been overcome by the construction of genomic libraries in the form of yeast and bacterial artificial chromosomes used to make physical maps before sequencing them individually (5, 6). An advantage was that such clone collections could be sequenced by large consortiums and organized as community efforts (7). A disadvantage was the lack of completion and the enor...