Genome-wide association studies (GWAS) have identified loci linked to hundreds of traits in many different species. Yet, because linkage equilibrium implicates a broad region surrounding each identified locus, the causal genes often remain unknown. This problem is especially pronounced in nonhuman, nonmodel species, where functional annotations are sparse and there is frequently little information available for prioritizing candidate genes. We developed a computational approach, Camoco, that integrates loci identified by GWAS with functional information derived from gene coexpression networks. Using Camoco, we prioritized candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize (Zea mays) seeds. Strikingly, we observed a strong dependence in the performance of our approach based on the type of coexpression network used: expression variation across genetically diverse individuals in a relevant tissue context (in our case, roots that are the primary elemental uptake and delivery system) outperformed other alternative networks. Two candidate genes identified by our approach were validated using mutants. Our study demonstrates that coexpression networks provide a powerful basis for prioritizing candidate causal genes from GWAS loci but suggests that the success of such strategies can highly depend on the gene expression data context. Both the software and the lessons on integrating GWAS data with coexpression networks generalize to species beyond maize.
We present the nucleot'de sequence of the galactokinase gene (galK) of Escherichia coli including its 5' and 3' flanking regions. This DNA sequence derives from the Xgal8 transducing phage and is identical to the sequence present in the galK gene fusion vectors, pKO and pKG, commonly used to study transcriptional regulatory elements. We define the precise 3' junction between the bacterial and phage sequences in Xgal8 and demonstrate that this junction probably results from a homologous recombination event between identical 9 bp sequences common to the gal operon and phage X. Moreover, we examine the 300 bp region located immediately beyond galK for transcription termination function and find no gal operon terminator. Lastly, we compare the gaIK genes of E. coli and the yeast S. cerevisiae and find several regions of strong homology among which is a potential ATP-binding site homology shared by a variety of ATP-binding proteins including protein kinases encoded by mammalian oncogenes.
22Genome-wide association studies (GWAS) have identified thousands of loci linked to hundreds of 23 traits in many different species. However, for most loci, the causal genes and the cellular processes 24 they contribute to remain unknown. This problem is especially pronounced in species where 25 functional annotations are sparse. Given little information about a gene, patterns of expression 26 are a powerful tool for inferring biological function. Here, we developed a computational 27 framework called Camoco that integrates loci identified by GWAS with functional information 28 derived from gene co-expression networks. We built co-expression networks from three distinct 29 biological contexts and establish the precision of our method with simulated GWAS data. We 30 applied Camoco to prioritize candidate genes from a large-scale GWAS examining the 31 accumulation of 17 different elements in maize seeds, demonstrating the need to match GWAS 32 datasets with co-expression networks derived from the appropriate biological context. 33Furthermore, our results show that simply taking the genes closest to significant GWAS loci will 34 often lead to spurious results, indicating the need for proper functional modeling and a reliable 35 null distribution when integrating these high-throughput data types. We performed functional 36 validation on a gene identified by our approach using mutants and annotate other high-priority 37 candidates with ontological enrichment and curated literature support, resulting in a targeted set 38 of candidate genes that drive elemental accumulation in maize grain.
Genome‐wide association studies (GWAS) have proven to be a valuable approach for identifying genetic intervals associated with phenotypic variation in Medicago truncatula. These intervals can vary in size, depending on the historical local recombination. Typically, significant intervals span numerous gene models, limiting the ability to resolve high‐confidence candidate genes underlying the trait of interest. Additional genomic data, including gene co‐expression networks, can be combined with the genetic mapping information to successfully identify candidate genes. Co‐expression network analysis provides information about the functional relationships of each gene through its similarity of expression patterns to other well‐defined clusters of genes. In this study, we integrated data from GWAS and co‐expression networks to pinpoint candidate genes that may be associated with nodule‐related phenotypes in M. truncatula. We further investigated a subset of these genes and confirmed that several had existing evidence linking them nodulation, including MEDTR2G101090 (PEN3‐like), a previously validated gene associated with nodule number.
Genome-wide association studies (GWAS) have proven to be a valuable approach for identifying genetic intervals associated with phenotypic variation in Medicago truncatula. These intervals can vary in size, depending on the historical local recombination near each significant interval. Typically, significant intervals span numerous gene models, limiting the ability to resolve high-confidence candidate genes underlying the trait of interest. Additional genomic data, including gene co-expression networks, can be combined with the genetic mapping information to successfully identify candidate genes. Co-expression network analysis provides information about the functional relationships of each gene through its similarity of expression patterns to other well-defined clusters of genes. In this study, we integrated data from GWAS and co-expression networks to pinpoint candidate genes that may be associated with nodule-related phenotypes in Medicago truncatula. We further investigated a subset of these genes and confirmed that several had existing evidence linking them nodulation, including MEDTR2G101090 (PEN3-like), a previously validated gene associated with nodule number.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.