We present a large-scale analysis of mRNA coexpression based on 60 large human data sets containing a total of 3924 microarrays. We sought pairs of genes that were reliably coexpressed (based on the correlation of their expression profiles) in multiple data sets, establishing a high-confidence network of 8805 genes connected by 220,649 "coexpression links" that are observed in at least three data sets. Confirmed positive correlations between genes were much more common than confirmed negative correlations. We show that confirmation of coexpression in multiple data sets is correlated with functional relatedness, and show how cluster analysis of the network can reveal functionally coherent groups of genes. Our findings demonstrate how the large body of accumulated microarray data can be exploited to increase the reliability of inferences about gene function.[Supplemental material is available online at www.genome.org and http://microarray.cpmc.columbia.edu/tmm.]Gene expression microarray data is a form of high-throughput genomics data providing relative measurements of mRNA levels for thousands of genes in a biological sample. In the last few years, hundreds of laboratories have collected and analyzed microarray data, and the data are beginning to appear in public databases or on researchers' Web sites. These resources serve at least two purposes. One is as an archive of the data, which allows other researchers to confirm the results that have been published by the originator of the data. A second use is to permit novel analyses of the data, that go beyond what was envisioned or possible at the time of the original study. A novel analysis could involve just a single data set, or a meta-analysis of many data sets (where a "data set" is a group of microarrays that were collected together, and typically described as a group in a single publication). The combined analysis of multiple data sets forms the main topic of this paper.Most existing studies that have analyzed multiple independently collected microarray data sets have focused on differential expression, comparing two or more similar data sets to look for genes that distinguish different sets of samples (Breitling et al.
Microdeletions of 22q11.2 represent one of the highest known genetic risk factors for schizophrenia. It is likely that more than one gene contributes to the marked risk associated with this locus. Two of the candidate risk genes encode the enzymes proline dehydrogenase (PRODH) and catechol-O-methyltransferase (COMT), which modulate the levels of a putative neuromodulator (L-proline) and the neurotransmitter dopamine, respectively. Mice that model the state of PRODH deficiency observed in humans with schizophrenia show increased neurotransmitter release at glutamatergic synapses as well as deficits in associative learning and response to psychomimetic drugs. Transcriptional profiling and pharmacological manipulations identified a transcriptional and behavioral interaction between the Prodh and Comt genes that is likely to represent a homeostatic response to enhanced dopaminergic signaling in the frontal cortex. This interaction modulates a number of schizophrenia-related phenotypes, providing a framework for understanding the high disease risk associated with this locus, the expression of the phenotype, or both.
Modification of oleic acid (C18:1) and linolenic acid (C18:3) contents in seeds is one of the major goals for quality breeding after removal of erucic acid in oilseed rape (Brassica napus). The fatty acid desaturase genes FAD2 and FAD3 have been shown as the major genes for the control of C18:1 and C18:3 contents. However, the genome structure and locus distributions of the two gene families in amphidiploid B. napus are still not completely understood to date. In the present study, all copies of FAD2 and FAD3 genes in the A- and C-genome of B. napus and its two diploid progenitor species, Brassica rapa and Brassica oleracea, were identified through bioinformatic analysis and extensive molecular cloning. Two FAD2 genes exist in B. rapa and B. oleracea, and four copies of FAD2 genes exist in B. napus. Three and six copies of FAD3 genes were identified in diploid species and amphidiploid species, respectively. The genetic control of high C18:1 and low C18:3 contents in a double haploid population was investigated through mapping of the quantitative trait loci (QTL) for the traits and the molecular cloning of the underlying genes. One major QTL of BnaA.FAD2.a located on A5 chromosome was responsible for the high C18:1 content. A deleted mutation in the BnaA.FAD2.a locus was uncovered, which represented a previously unidentified allele for the high oleic variation in B. napus species. Two major QTLs on A4 and C4 chromosomes were found to be responsible for the low C18:3 content in the DH population as well as in SW Hickory. Furthermore, several single base pair changes in BnaA.FAD3.b and BnaC.FAD3.b were identified to cause the phenotype of low C18:3 content. Based on the results of genetic mapping and identified sequences, allele-specific markers were developed for FAD2 and FAD3 genes. Particularly, single-nucleotide amplified polymorphisms markers for FAD3 alleles were demonstrated to be a reliable type of SNP markers for unambiguous identification of genotypes with different content of C18:3 in amphidiploid B. napus.
One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.