Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits1,2. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions3,4. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.
Protein-coding gene annotation. To search for homologous genes, the protein sequences from all ferns and lycophytes transcriptomes in the OneKP project 1 were retrieved and aligned to the A. capillus-veneris genome, using GeneWise 2 . For transcriptome-based prediction, nineteen transcriptomes covering the entire life cycle of A. capillus-veneris were generated in this study (Supplementary Table 8). RNA was extracted using the Qiagen RNeasy protocol and sequenced on an Illumina HiSeq 4000 with a 300 bp insert size. For transcriptome-based prediction, the HISAT2 3 and StringTie 4 programs were used for transcript assembly 5 . The program PASA (http://pasapipeline.github.io) was used to align spliced transcripts and annotate candidate genes. Ab initio prediction was performed with AUGUSTUS 6 , GlimmerHMM 7 , and SNAP 8 . Finally, nonredundant gene models were obtained with EVidenceModeler (version 1.1.0) 9 to integrate the gene models developed by different datasets.To validate the assembly quality, RNA-seq reads from nineteen tissues (Supplementary Table 8), together with publicly available EST sequences from the NCBI database (downloaded from http://togodb.dbcls.jp/library), were mapped to the A. capillus-veneris genome using HISAT2 3 and BLAT 10 with default parameters, respectively. The BLAT results were filtered with an identity and coverage cutoff of 0.9.Identification of noncoding RNAs. We used tRNAscan-SE (version 2.0rc2) 11 , with default parameters, to search for tRNAs in the A. capillus-veneris genome. A total of 1,624 tRNAs were found. Moreover, the Rfam14.0 database 12 , including 3,445 noncoding RNA families, was used to annotate additional noncoding RNAs (ncRNAs), including miRNAs, snRNAs, and tRNAs, using INFERNAL (version 1.1.2) 13 program.We predicted rRNA (5S, 5.8S, 28S, 18S) by using HMM searching based rRNA predicator Barrnap (version 0.9, https://github.com/tseemann/barrnap#barrnap), with default parameters. We finally identified 145 5S, 75 5.8S, 155 28S, and 165 18S sequences and their locations within the genome assembly of A. capillus-veneris.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.