Cotton is one of the most economically important crop plants worldwide. Its fiber, commonly known as cotton lint, is the principal natural source for the textile industry. Approximately 33 million ha (5% of the world's arable land) is used for cotton planting 1 , with an annual global market value of textile mills of approximately $630.6 billion in 2011 (MarketPublishers; see URLs). Apart from its economic value, cotton is also an excellent model system for studying polyploidization, cell elongation and cell wall biosynthesis 2-5 .The Gossypium genus contains 5 tetraploid (AD 1 to AD 5 , 2n = 4×) and over 45 diploid (2n = 2×) species (where n is the number of chromosomes in the gamete of an individual), which are believed to have originated from a common ancestor approximately 5-10 million years ago 6 . Eight diploid subgenomes, designated as A to G and K, have been found across North America, Africa, Asia and Australia. The haploid genome size of diploid cottons (2n = 2× = 26) varies from about 880 Mb (G. raimondii Ulbrich) in the D genome to 2,500 Mb in the K genome 7,8 . Diploid cotton species share a common chromosome number (n = 13), and high levels of synteny or colinearity are observed among them 9-12 . The tetraploid cotton species (2n = 4× = 52), such as G. hirsutum L. and Gossypium barbadense L., are thought to have formed by an allopolyploidization event that occurred approximately 1-2 million years ago, which involved a D-genome species as the pollen-providing parent and an A-genome species as the maternal parent 13,14 . To gain insights into the cultivated polyploid genomes-how they have evolved and how their subgenomes interact-it is first necessary to have a basic knowledge of the structure of the component genomes. Therefore, we have created a draft sequence of the putative D-genome parent, G. raimondii, using DNA samples prepared from Cotton Microsatellite Database (CMD) 10 (refs. 15,16), a genetic standard originated from a single seed (accession D 5 -3) in 2004 and brought to near homozygosity by six successive generations of self-fertilization. We believe that sequencing of the G. raimondii genome will not only provide a major source of candidate genes important for the genetic improvement of cotton quality and productivity, but it may also serve as a reference for the assembly of the tetraploid G. hirsutum genome.
RESULTS
Sequencing and assemblyA whole-genome shotgun strategy was used to sequence and assemble the G. raimondii genome. A total of 78.7 Gb of next-generation Illumina paired-end 50-bp, 100-bp and 150-bp reads was generated by sequencing genome shotgun libraries of different fragment lengths (170 bp, 250 bp, 500 bp, 800 bp, 2 kb, 5 kb, 10 kb, 20 kb and 40 kb) that covered 103.6-fold of the 775.2-Mb assembled G. raimondii genome (Supplementary Table 1). The resulting assembly appeared to cover a very large proportion of the euchromatin of the G. raimondii genome. The unassembled genomic regions are likely to contain heterochromatic satellites, large repetitive sequences or ribosoma...