Rapeseed (Brassica napus), the second most important oilseed crop globally, originated from an interspecific hybridization between B. rapa and B. oleracea. After this genome collision, B. napus underwent extensive genome restructuring, via homoeologous chromosome exchanges, resulting in widespread segmental deletions and duplications. Illicit pairing among genetically similar homoeologous chromosomes during meiosis is common in recent allopolyploids like B. napus, and postpolyploidization restructuring compounds the difficulties of assembling a complex polyploid plant genome. Specifically, genomic rearrangements between highly similar chromosomes are challenging to detect due to the limitation of sequencing read length and ambiguous alignment of reads. Recent advances in long read sequencing technologies provide promising new opportunities to unravel the genome complexities of B. napus by encompassing breakpoints of genomic rearrangements with high specificity. Moreover, recent evidence revealed ongoing genomic exchanges in natural B. napus, highlighting the need for multiple reference genomes to capture structural variants between accessions. Here we report the first long-read genome assembly of a winter B. napus cultivar. We sequenced the German winter oilseed rape accession 'Express 617' using 54.5x of long reads. Short reads, linked reads, optical map data and high-density genetic maps were used to further correct and scaffold the assembly to form pseudochromosomes. The assembled Express 617 genome provides another valuable resource for Brassica genomics in understanding the genetic consequences of polyploidization, crop domestication, and breeding of recently-formed crop species.
SummaryEvolutionary processes during plant polyploidization and speciation have led to extensive presence–absence variation (PAV) in crop genomes, and there is increasing evidence that PAV associates with important traits. Today, high‐resolution genetic analysis in major crops frequently implements simple, cost‐effective, high‐throughput genotyping from single nucleotide polymorphism (SNP) hybridization arrays; however, these are normally not designed to distinguish PAV from failed SNP calls caused by hybridization artefacts. Here, we describe a strategy to recover valuable information from single nucleotide absence polymorphisms (SNaPs) by population‐based quality filtering of SNP hybridization data to distinguish patterns associated with genuine deletions from those caused by technical failures. We reveal that including SNaPs in genetic analyses elucidate segregation of small to large‐scale structural variants in nested association mapping populations of oilseed rape (Brassica napus), a recent polyploid crop with widespread structural variation. Including SNaP markers in genomewide association studies identified numerous quantitative trait loci, invisible using SNP markers alone, for resistance to two major fungal diseases of oilseed rape, Sclerotinia stem rot and blackleg disease. Our results indicate that PAV has a strong influence on quantitative disease resistance in B. napus and that SNaP analysis using cost‐effective SNP array data can provide extensive added value from ‘missing data’. This strategy might also be applicable for improving the precision of genetic mapping in many important crop species.
Summary
Plant genomes demonstrate significant presence/absence variation (PAV) within a species; however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.