ArrayExpress is a public database for high throughput functional genomics data. ArrayExpress consists of two parts—the ArrayExpress Repository, which is a MIAME supportive public archive of microarray data, and the ArrayExpress Data Warehouse, which is a database of gene expression profiles selected from the repository and consistently re-annotated. Archived experiments can be queried by experiment attributes, such as keywords, species, array platform, authors, journals or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms and gene expression profiles can be visualized. ArrayExpress is a rapidly growing database, currently it contains data from >50 000 hybridizations and >1 500 000 individual expression profiles. ArrayExpress supports community standards, including MIAME, MAGE-ML and more recently the proposal for a spreadsheet based data exchange format: MAGE-TAB. Availability: .
Although there is only one human genome sequence, different genes are expressed in many different cell types and tissues, as well as in different developmental stages or diseases. The structure of this 'expression space' is still largely unknown, as most transcriptomics experiments focus on sampling small regions. We have constructed a global gene expression map by integrating microarray data from 5,372 human samples representing 369 different cell and tissue types, disease states and cell lines. These have been compiled in an online resource (http://www.ebi.ac.uk/gxa/array/U133A) that allows the user to search for a gene of interest and find the conditions in which it is over-or underexpressed, or, conversely, to find which genes are over-or underexpressed in a particular condition. An analysis of the structure of the expression space reveals that it can be described by a small number of distinct expression profile classes and that the first three principal components of this space have biological interpretations. The hematopoietic system, solid tissues and incompletely differentiated cell types are arranged on the first principal axis; cell lines, neoplastic samples and nonneoplastic primary tissue-derived samples are on the second principal axis; and nervous system is separated from the rest of the samples on the third axis. We also show below that most cell lines cluster together rather than with their tissues of origin.The widely used GNF Gene Expression Atlas 1,2 includes a variety of normal tissue and cell types as well as certain disease states. Many more different biological states, such as rare diseases or particular cell subtypes, exist. It is impractical for a single dedicated experiment to generate a comprehensive expression data set covering all biological conditions, partly owing to cost, but also because some conditions are studied only in specialized laboratories. Even so, we can use computational approaches to integrate the wealth of experiments that already have been performed.Integration of independent microarray studies is challenging, as microarrays do not measure gene expression in any absolute units. Several studies have integrated single-platform 3 and cross platform 4-6 data from single-channel oligonucleotide arrays yielding consistent results. It has been generally accepted, however, that only data from the same platform can be reliably integrated on a quantitative level 7 . Integration is also challenging because of the unavoidable complexity of sample descriptions. The Unified Medical Language System has been used to re-annotate free text-based sample descriptions 8 ; however, extracting information from brazma@ebi.ac.uk .
Motivation: Describing biological sample variables with ontologies is complex due to the cross-domain nature of experiments. Ontologies provide annotation solutions; however, for cross-domain investigations, multiple ontologies are needed to represent the data. These are subject to rapid change, are often not interoperable and present complexities that are a barrier to biological resource users.Results: We present the Experimental Factor Ontology, designed to meet cross-domain, application focused use cases for gene expression data. We describe our methodology and open source tools used to create the ontology. These include tools for creating ontology mappings, ontology views, detecting ontology changes and using ontologies in interfaces to enhance querying. The application of reference ontologies to data is a key problem, and this work presents guidelines on how community ontologies can be presented in an application ontology in a data-driven way.Availability: http://www.ebi.ac.uk/efoContact: malone@ebi.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.