Gossypium hirsutum has proven difficult to sequence owing to its complex allotetraploid (AtDt) genome. Here we produce a draft genome using 181-fold paired-end sequences assisted by fivefold BAC-to-BAC sequences and a high-resolution genetic map. In our assembly 88.5% of the 2,173-Mb scaffolds, which cover 89.6%∼96.7% of the AtDt genome, are anchored and oriented to 26 pseudochromosomes. Comparison of this G. hirsutum AtDt genome with the already sequenced diploid Gossypium arboreum (AA) and Gossypium raimondii (DD) genomes revealed conserved gene order. Repeated sequences account for 67.2% of the AtDt genome, and transposable elements (TEs) originating from Dt seem more active than from At. Reduction in the AtDt genome size occurred after allopolyploidization. The A or At genome may have undergone positive selection for fiber traits. Concerted evolution of different regulatory mechanisms for Cellulose synthase (CesA) and 1-Aminocyclopropane-1-carboxylic acid oxidase1 and 3 (ACO1,3) may be important for enhanced fiber production in G. hirsutum.
Cotton is one of the most economically important crop plants worldwide. Its fiber, commonly known as cotton lint, is the principal natural source for the textile industry. Approximately 33 million ha (5% of the world's arable land) is used for cotton planting 1 , with an annual global market value of textile mills of approximately $630.6 billion in 2011 (MarketPublishers; see URLs). Apart from its economic value, cotton is also an excellent model system for studying polyploidization, cell elongation and cell wall biosynthesis 2-5 .The Gossypium genus contains 5 tetraploid (AD 1 to AD 5 , 2n = 4×) and over 45 diploid (2n = 2×) species (where n is the number of chromosomes in the gamete of an individual), which are believed to have originated from a common ancestor approximately 5-10 million years ago 6 . Eight diploid subgenomes, designated as A to G and K, have been found across North America, Africa, Asia and Australia. The haploid genome size of diploid cottons (2n = 2× = 26) varies from about 880 Mb (G. raimondii Ulbrich) in the D genome to 2,500 Mb in the K genome 7,8 . Diploid cotton species share a common chromosome number (n = 13), and high levels of synteny or colinearity are observed among them 9-12 . The tetraploid cotton species (2n = 4× = 52), such as G. hirsutum L. and Gossypium barbadense L., are thought to have formed by an allopolyploidization event that occurred approximately 1-2 million years ago, which involved a D-genome species as the pollen-providing parent and an A-genome species as the maternal parent 13,14 . To gain insights into the cultivated polyploid genomes-how they have evolved and how their subgenomes interact-it is first necessary to have a basic knowledge of the structure of the component genomes. Therefore, we have created a draft sequence of the putative D-genome parent, G. raimondii, using DNA samples prepared from Cotton Microsatellite Database (CMD) 10 (refs. 15,16), a genetic standard originated from a single seed (accession D 5 -3) in 2004 and brought to near homozygosity by six successive generations of self-fertilization. We believe that sequencing of the G. raimondii genome will not only provide a major source of candidate genes important for the genetic improvement of cotton quality and productivity, but it may also serve as a reference for the assembly of the tetraploid G. hirsutum genome. RESULTS Sequencing and assemblyA whole-genome shotgun strategy was used to sequence and assemble the G. raimondii genome. A total of 78.7 Gb of next-generation Illumina paired-end 50-bp, 100-bp and 150-bp reads was generated by sequencing genome shotgun libraries of different fragment lengths (170 bp, 250 bp, 500 bp, 800 bp, 2 kb, 5 kb, 10 kb, 20 kb and 40 kb) that covered 103.6-fold of the 775.2-Mb assembled G. raimondii genome (Supplementary Table 1). The resulting assembly appeared to cover a very large proportion of the euchromatin of the G. raimondii genome. The unassembled genomic regions are likely to contain heterochromatic satellites, large repetitive sequences or ribosoma...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.