Naturally occurring functional genetic variation is often employed to identify genetic loci that regulate specific traits. Existing approaches to link functional genetic variation to quantitative phenotypic outcomes typically evaluate one or several traits at a time. Advances in high throughput phenotyping now enable datasets which include information on dozens or hundreds of traits scored across multiple environments. Here, we develop an approach to use data from many phenotypic traits simultaneously to identify causal genetic loci. Using data for 260 traits scored across a maize diversity panel, we demonstrate that a distinct set of genes are identified relative to conventional genome wide association. The genes identified using this many-trait approach are more likely to be independently validated than the genes identified by conventional analysis of the same dataset. Genes identified by the new many-trait approach share a number of molecular, population genetic, and evolutionary features with a gold standard set of genes characterized through forward genetics. These features, as well as substantially stronger functional enrichment and purification, separate them from both genes identified by conventional genome wide association and from the overall population of annotated gene models. These results are consistent with a large subset of annotated gene models in maize playing little or no role in determining organismal phenotypes.
2/174/17 GWAS Analysis GLM GWAS analyses were conducted using the algorithm first defined by Price and coworkers 31 and FarmCPU GWAS with the algrothm defined by Liu and colleagues 32 . Both algorithms were run using the R-based software rMVP (A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool For Genome-Wide Association Study) (https://github.com/XiaoleiLiuBio/rMVP). Both analysis methods were run using maxLoop = 10 and the variance component method method.bin = "Fast-LMM" 55 . The first three principal components were considered as additional covariates for population structure control. For comparison to GPWAS results, each gene was assigned the p-value of the single most significant SNP among all the SNPs assigned to that gene across 260 analyzed phenotypes in the GWAS model.
Nested Association Mapping ComparisonPublished associations identified for 41 phenotypes scored across 5,000 maize recombinant inbred lines were retrieved from Panzea (htt p : //cbsusrv04.tc.cornell.edu/users/panzea/download.aspx? f ilegroupid = 14) 27 . Following the thresholding proposed in that paper a SNP and CNV (copy number variant) hits with a resample model inclusion probability ≥ 0.05 which were either within the longest annotated transcript for each gene AGPv2.16 or within 15kb upstream or downstream from the annotated transcription start and stop sites were assigned to that gene. Gene models were converted from B73 RefGenV2 to B73 Re-fGenV4 using a conversion list published on MaizeGDB (https://www.maizegdb.org/search/gene/download gene xrefs.php?relative=v4).
Gene Expression ...