Allotetraploid cotton species (Gossypium hirsutum and Gossypium barbadense) have long been cultivated worldwide for natural renewable textile fibers. The draft genome sequences of both species are available but they are highly fragmented and incomplete 1-4. Here we report referencegrade genome assemblies and annotations for G. hirsutum accession Texas Marker-1 (TM-1) and G. barbadense accession 3-79 by integrating single-molecule real-time sequencing, BioNano optical mapping and high-throughput chromosome conformation capture techniques. Compared with previous assembled draft genomes 1,3 , these genome sequences show considerable improvements in contiguity and completeness for regions with high content of repeats such as centromeres. Comparative genomics analyses identify extensive structural variations that probably occurred after polyploidization, highlighted by large paracentric/pericentric inversions in 14 chromosomes. We constructed an introgression line population to introduce favorable chromosome segments from G. barbadense to G. hirsutum, allowing us to identify 13 quantitative trait loci associated with superior fiber quality. These resources will accelerate evolutionary and functional genomic studies in cotton and inform future breeding programs for fiber improvement. Cotton represents the largest source of natural textile fibers in the world. Over 90% of annual fiber production comes from allotetraploid cotton (G. hirsutum and G. barbadense), which originated from an allopolyplodization event approximately 1-2 million year ago, followed by millennia of asymmetric subgenome selection 5,6. G. hirsutum is cultivated all over the world because of its high yield and G. barbadense is prized for its superior fiber quality. To cultivate G. hirsutum that produces longer, finer and stronger fibers, one approach is to introduce the superior fiber traits from G. barbadense into G. hirsutum. A genomics-enabled breeding strategy requires a detailed and robust understanding of genomic organization. Genomic feature G. hirsutum G. barbadense
Publisher's copyright statement:Additional information: Use policyThe full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-pro t purposes provided that:• a full bibliographic reference is made to the original source • a link is made to the metadata record in DRO • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders.Please consult the full DRO policy for further details. Dt, 0.56 × 10 -3 ) ( Fig. 1d and Supplementary Fig. 3). This shows that a large amount is associated with the development of the long fiber trait in cultivated cotton (Fig. 3b). 217Domestication has led to the transformation of cotton fiber from brown to white. 218To understand this phenomenon, we examined two homoeologous gene pairs only 219 subjected to domestication selection in the Dt, 4-COUMARATE:COA LIGASE (4CL) 220 and CHALCONE SYNTHASE (CHS), which encode enzymes involved in the 221 phenylpropanoid metabolic pathway ( Fig. 3c and Supplementary Fig. 6 Fig. 3c). These SNPs display reductions in nucleotide diversity that occurred 225 during domestication (Fig. 3c). Interestingly, we found that the two SNPs in the Fig. 8) 42 . We identified a total of 188,360 DNase I-hypersensitive 248 sites (DHSs) in cotton leaves and fibers, of which ca. 47% are common to both tissues 249 (Fig. 4a). DHSs were preferentially identified in chromosomal arms and 250 approximately half were detected in promoter and intergenic regions ( Fig. 4b and 251 Supplementary Fig. 9). We found DHSs are hypo-methylated, consistent with 252 previous studies 42 (Fig. 4c) H3K4me1 and inactive H3K9me2 (Fig. 4d). Intergenic DHSs were also found to 255 exhibit an enrichment of H3K4me3 and H3K27me3, but depletion of H3K9me2 and 256 no enrichment of H3K4me1 (Fig. 4e). As predicted, the patterns of chromatin 257 modification marks in cotton are different between genic and TE regions 258 ( Supplementary Fig. 10). In addition, genes with promoter DHSs are generally 259 expressed at a higher level in both tissues than those without promoter DHSs (Fig. 4f), 260 and tissue-specific promoter DHSs corresponded to higher levels of gene expression 261 ( Fig. 4g) Hi-C analysis was carried out using the TM-1 accession to characterize global 296 chromatin interactions. We generated 1.1 billion Hi-C paired-end reads, of which ca. possible Hi-C bias, HindIII fragments of less than 2 kb were merged to obtain 299 305,682 chromosomal anchor regions (Fig. 5a). On the basis of a high-quality 300 genome assembly of TM-1 (Supplementary Fig. 11), we used the Hi-C data to 301 characterize the cotton chromatin interactome (Supplementary Fig. 12) and ( Fig. 5b), but many topologically associated domain-like (TAD-like) regions were 305 identified (Fig. 5c, Supplementary Fig. 13 and Supplementary are less frequent at regions marked by H3K9me2 (Fig. 5d). (Fig. 5g). 320We...
Gossypium hirsutum contributes the most production of cotton fibre, but G. barbadense is valued for its better comprehensive resistance and superior fibre properties. However, the allotetraploid genome of G. barbadense has not been comprehensively analysed. Here we present a high-quality assembly of the 2.57 gigabase genome of G. barbadense, including 80,876 protein-coding genes. The double-sized genome of the A (or At) (1.50 Gb) against D (or Dt) (853 Mb) primarily resulted from the expansion of Gypsy elements, including Peabody and Retrosat2 subclades in the Del clade, and the Athila subclade in the Athila/Tat clade. Substantial gene expansion and contraction were observed and rich homoeologous gene pairs with biased expression patterns were identified, suggesting abundant gene sub-functionalization occurred by allopolyploidization. More specifically, the CesA gene family has adapted differentially temporal expression patterns, suggesting an integrated regulatory mechanism of CesA genes from At and Dt subgenomes for the primary and secondary cellulose biosynthesis of cotton fibre in a “relay race”-like fashion. We anticipate that the G. barbadense genome sequence will advance our understanding the mechanism of genome polyploidization and underpin genome-wide comparison research in this genus.
SummaryAlternative splicing (AS) is a crucial regulatory mechanism in eukaryotes, which acts by greatly increasing transcriptome diversity. The extent and complexity of AS has been revealed in model plants using high-throughput next-generation sequencing. However, this technique is less effective in accurately identifying transcript isoforms in polyploid species because of the high sequence similarity between coexisting subgenomes.Here we characterize AS in the polyploid species cotton. Using Pacific Biosciences singlemolecule long-read isoform sequencing (Iso-Seq), we developed an integrated pipeline for Iso-Seq transcriptome data analysis (https://github.com/Nextomics/pipeline-for-isoseq).We identified 176 849 full-length transcript isoforms from 44 968 gene models and updated gene annotation. These data led us to identify 15 102 fibre-specific AS events and estimate that c. 51.4% of homoeologous genes produce divergent isoforms in each subgenome. We reveal that AS allows differential regulation of the same gene by miRNAs at the isoform level. We also show that nucleosome occupancy and DNA methylation play a role in defining exons at the chromatin level.This study provides new insights into the complexity and regulation of AS, and will enhance our understanding of AS in polyploid species. Our methodology for Iso-Seq data analysis will be a useful reference for the study of AS in other species.
Summary Long noncoding RNAs (lncRNAs) are transcripts of at least 200 bp in length, possess no apparent coding capacity and are involved in various biological regulatory processes. Until now, no systematic identification of lncRNAs has been reported in cotton (Gossypium spp.). Here, we describe the identification of 30 550 long intergenic noncoding RNA (lincRNA) loci (50 566 transcripts) and 4718 long noncoding natural antisense transcript (lncNAT) loci (5826 transcripts). LncRNAs are rich in repetitive sequences and preferentially expressed in a tissue‐specific manner. The detection of abundant genome‐specific and/or lineage‐specific lncRNAs indicated their weak evolutionary conservation. Approximately 76% of homoeologous lncRNAs exhibit biased expression patterns towards the At or Dt subgenomes. Compared with protein‐coding genes, lncRNAs showed overall higher methylation levels and their expression was less affected by gene body methylation. Expression validation in different cotton accessions and coexpression network construction helped to identify several functional lncRNA candidates involved in cotton fibre initiation and elongation. Analysis of integrated expression from the subgenomes of lncRNAs generating miR397 and its targets as a result of genome polyploidization indicated their pivotal functions in regulating lignin metabolism in domesticated tetraploid cotton fibres. This study provides the first comprehensive identification of lncRNAs in Gossypium.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.