Relative to other crops, red clover (Trifolium pratense L.) has various favorable traits making it an ideal forage crop. Conventional breeding has improved varieties, but modern genomic methods could accelerate progress and facilitate gene discovery. Existing short-read-based genome assemblies of the ∼420 megabase pair (Mbp) genome are fragmented into >135,000 contigs, with numerous order and orientation errors within scaffolds, probably associated with the plant's biology, which displays gametophytic self-incompatibility resulting in inherent high heterozygosity. Here, we present a high-quality long-read-based assembly of red clover with a more than 500-fold reduction in contigs, improved per-base quality, and increased contig N50 by three orders of magnitude. The 413.5 Mbp assembly is nearly 20% longer than the 350 Mbp short-read assembly, closer to the predicted genome size. We also present quality measures and full-length isoform RNA transcript sequences for assessing accuracy and future genome annotation. The assembly accurately represents the seven main linkage groups in an allogamous (outcrossing), highly heterozygous plant genome.
Subjects Genetics and Genomics, Bioinformatics, Plant Genetics
DATA DESCRIPTION
BackgroundThe species Trifolium pratense L. (red clover, NCBI:txid57577) is an important legume forage crop grown on approximately 4 million hectares worldwide [1]. Red clover is a versatile crop grown as animal feed and/or as a green manure in pure and mixed stands for hay, haylage, silage, and grazing. Red clover is known for its ease of establishment and shade tolerance, and its ability to grow in poorly drained and low pH soils. The reduced need for exogenous nitrogen application owing to its nitrogen-fixing ability and the relatively high protein content of this plant compared with other forage crops provide potential for reducing the environmental footprint of livestock production. Compared to alfalfa, another common legume forage crop, red clover varieties have higher forage yields, are a better source of magnesium to avoid grass tetany in grazing cattle, and may have improved post-harvest protein preservation [2] and bypass protein content in ruminant production