Allotetraploid cotton is an economically important natural-fiber-producing crop worldwide. After polyploidization, Gossypium hirsutum L. evolved to produce a higher fiber yield and to better survive harsh environments than Gossypium barbadense, which produces superior-quality fibers. The global genetic and molecular bases for these interspecies divergences were unknown. Here we report high-quality de novo-assembled genomes for these two cultivated allotetraploid species with pronounced improvement in repetitive-DNA-enriched centromeric regions. Whole-genome comparative analyses revealed that speciesspecific alterations in gene expression, structural variations and expanded gene families were responsible for speciation and the evolutionary history of these species. These findings help to elucidate the evolution of cotton genomes and their domestication history. The information generated not only should enable breeders to improve fiber quality and resilience to ever-changing environmental conditions but also can be translated to other crops for better understanding of their domestication history and use in improvement.
BackgroundSNPs are the most abundant polymorphism type, and have been explored in many crop genomic studies, including rice and maize. SNP discovery in allotetraploid cotton genomes has lagged behind that of other crops due to their complexity and polyploidy. In this study, genome-wide SNPs are detected systematically using next-generation sequencing and efficient SNP genotyping methods, and used to construct a linkage map and characterize the structural variations in polyploid cotton genomes.ResultsWe construct an ultra-dense inter-specific genetic map comprising 4,999,048 SNP loci distributed unevenly in 26 allotetraploid cotton linkage groups and covering 4,042 cM. The map is used to order tetraploid cotton genome scaffolds for accurate assembly of G. hirsutum acc. TM-1. Recombination rates and hotspots are identified across the cotton genome by comparing the assembled draft sequence and the genetic map. Using this map, genome rearrangements and centromeric regions are identified in tetraploid cotton by combining information from the publicly-available G. raimondii genome with fluorescent in situ hybridization analysis.ConclusionsWe report the genotype-by-sequencing method used to identify millions of SNPs between G. hirsutum and G. barbadense. We construct and use an ultra-dense SNP map to correct sequence mis-assemblies, merge scaffolds into pseudomolecules corresponding to chromosomes, detect genome rearrangements, and identify centromeric regions in allotetraploid cottons. We find that the centromeric retro-element sequence of tetraploid cotton derived from the D subgenome progenitor might have invaded the A subgenome centromeres after allotetrapolyploid formation. This study serves as a valuable genomic resource for genetic research and breeding of cotton.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0678-1) contains supplementary material, which is available to authorized users.
We characterize GoSP genes underlying the development of cotton plants with short branches and clustered bolls, a phenotype that allows higher planting density and promotes increased fiber yield per acre.
Although much research has been conducted to characterize microsatellites and develop markers, the distribution of microsatellites remains ambiguous and the use of microsatellite markers in genomic studies and marker-assisted selection is limited. To identify microsatellites for cotton research, we mined 100,290, 83,160, and 56,937 microsatellites with frequencies of 41.2, 49.1, and 74.8 microsatellites per Mb in the recently sequenced Gossypium species: G. hirsutum, G. arboreum, and G. raimondii, respectively. The distributions of microsatellites in their genomes were non-random and were positively and negatively correlated with genes and transposable elements, respectively. Of the 77,996 developed microsatellite markers, 65,498 were physically anchored to the 26 chromosomes of G. hirsutum with an average marker density of 34 markers per Mb. We confirmed 67,880 (87%) universal and 7,705 (9.9%) new genic microsatellite markers. The polymorphism was estimated in above three species by in silico PCR and validated with 505 markers in G. hirsutum. We further predicted 8,825 polymorphic microsatellite markers within G. hirsutum acc. TM-1 and G. barbadense cv. Hai7124. In our study, genome-wide mining and characterization of microsatellites, and marker development were very useful for the saturation of the allotetraploid genetic linkage map, genome evolution studies and comparative genome mapping.
An improved method of rapid gene mapping and identification was successfully used to map and identify the causal gene at the virescent-1 locus of upland cotton.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.