Microarray-based studies of global gene expression (GE) have resulted in a large amount of data that can be mined for further insights into disease and physiology. Meta-analysis of these data is hampered by technical limitations due to many different platforms, gene annotations and probes used in different studies. We tested the feasibility of conducting a meta-analysis of GE studies to determine a transcriptional signature of hematopoietic progenitor and stem cells. Data from studies that used normal bone marrow-derived hematopoietic progenitors was integrated using both RefSeq and UniGene identifiers. We observed that in spite of variability introduced by experimental conditions and different microarray platforms, our meta-analytical approach can distinguish biologically distinct normal tissues by clustering them based on their cell of origin. When studied in terms of disease states, GE studies of leukemias and myelodysplasia progenitors tend to cluster with normal progenitors and remain distinct from other normal tissues, further validating the discriminatory power of this meta-analysis. Furthermore, analysis of 57 normal hematopoietic stem and progenitor cell GE samples was used to determine a gene expression signature characteristic of these cells. Genes that were most uniformly expressed in progenitors and at the same time differentially expressed when compared to other normal tissues were found to be involved in important biological processes such as cell cycle regulation and hematopoiesis. Validation studies using a different microarray platform demonstrated the enrichment of several genes such as SMARCE, Septin 6 and others not previously implicated in hematopoiesis. Most interestingly, alpha-integrin, the only common stemness gene discovered in a recent comparative murine analysis (Science 302(5644):393) was also enriched in our dataset, demonstrating the usefulness of this analytical approach.
Chronic idiopathic myelofibrosis (MF) is a clonal hematopoietic disorder that leads to progressive marrow fibrosis and peripheral cytopenias. Very little is known about the role of chromosomal alterations and DNA methylation in the pathobiology of this disease. We used a combination of gene expression analysis, high density array based comparative genomic hybridization (aCGH) and genome wide methylation analysis to perform an integrated genomic analysis of MF. Gene expression analysis was performed using 37K oligo maskless arrays and high density aCGH was performed at 6Kb resolution using Nimblegen platform. Whole genome methylation was analyzed by a recently described novel method ( Khulan et al, Genome Res. 2006 Aug;16(8)) that uses differential methylation specific digestion by HpaII and MspI followed by pcr amplification, two color labeling and hybridization to quantitatively determine individual promoter CpG island methylation. aCGH revealed a very high number of microdeletions (range 622–1148, mean ± SD 555±144 ) and amplifications (range 463–770, mean ± SD 781±246) in a pilot study conducted in 4 patients with MF. Twenty three common regions were found to amplified in all patients which included regions of chr3p25, chr8p21, chr12q24, chr14q32, chr17p13 and chr17q12, which code for a novel set of genes including B-cell CLL/lymphoma 7A, GTPase activating Rap, BRF1 and others. Five regions were found to be commonly deleted in all samples (chr1q31, chr5q12, chr20p13, chrXq21, chrXq28). Thirty eight DNA segments were found to be deleted and 142 amplified in 75% of the samples. Several potential pathogenic genes encoding for transcription factors, cytokines and cytokine receptors were found to be coded by these segments. A custom human oligo array was used to determine methylation by calculating HpaII/MspI cut fragment intensity ratio. All patient samples had a very high level of methylation (range 64–82%, mean 72% ± 8.6%). Expression was found to be significantly decreased for the genes that were methylated (p<.0001, T test) demonstrating the functional relevance of this assay. Analysis of common differentially methylated genes (when compared to normal samples) and their validation are ongoing. Microdeletions and amplifications seen on aCGH did not correlate with changes in global expression of the involved genes. Interestingly, when data from all platforms were combined, methylation of the genes with altered copy number led to significant decreases in gene expression (p=.03, T test). This result suggests pathogenic changes in gene expression in MF result from an interplay between DNA copy number alterations and methylation of the remaining alleles. The high rate of methylation demonstrated in MF suggests that epigenetic silencing of genes may play an important role in pathogenesis and points to the potential utility of hypomethylating agents in this disease.
Microarray based studies of global Gene Expression (GE) have led to dramatic advances in our understanding of various biological processes and have resulted in a large amount of data in public repositories, like the Gene Expression Omnibus (GEO). Metaanalysis of this data has the potential to yield important biological information, but is hampered by technical issues due to different platforms and gene annotations used in various studies. In an attempt to conduct a metaanalysis, a total of 69 individual normal hematopoietic stem cell (HSC) GE datasets (9 whole bone marrow, 57 CD34+ cell studies) were identified in GEO. These had been done on 3 microarray platforms (Affymetrix U95, U133 A/B and U133 Plus 2.0). Since the probe identifiers and complementary cDNAs were different on these platforms, we integrated the data using both Unigene and RefSeq protein IDs and obtained a total of 8598 common Unigene and 8345 RefSeq probes after removing missing values. Unsupervised clustering of normalized GE values demonstrated that experimental conditions, lab where the experiments were performed and different microarray platforms can result in variability in GE patterns from similar sources of cells. To determine the degree of dissimilarity of these datasets from those obtained from biologically distinct tissues, GE profiles from various human tissues (brain, heart, kidney, etc.) were obtained from GEO and compared with hematopoietic stem cells. Unsupervised clustering showed that samples from the same tissue of origin clustered together despite different platforms/labs, demonstrating that our approach can group biologically distinct tissues together in spite of experimental and platform variability. To further test the discriminatory ability of the metaanalysis, we took datasets from hematologic malignancies and normal hematopoietic and non-hematopoietic tissues analyzed with the same platform (U133). We observed greater similarity between leukemias, myelodysplasia (MDS) and normal HSCs when compared to non-hematopoietic tissues, again validating the discriminatory power of this metaanalysis. In fact, some datasets from bone marrow samples from MDS were very similar to normal CD34+ cells and clustered within their groups. We believe this was a strong validation of our analysis as MDS is a preleukemic disorder with varying levels of pathology and can have cases that are genetically very similar to normal hematopoietic stems. We next attempted to search for a gene expression signature characteristic of HSCs by finding genes that were uniformly enriched in HSC datasets and at the same time differentially expressed when compared to normal non-hematopoietic tissues. We found 46 such “stemness” genes in our dataset. Functional pathway analysis by Ingenuity revealed that these genes were part of cell cycle and hematopoiesis pathways, thus decreasing the likelihood of our findings to be due to chance. In addition to known genes such as Gata2, Myb, Lyn kinase and Stat5A; several novel functional genes like SWI/SNF family member SMARCE1, Bone marrow stromal antigen 2, Septin 6, Topoisomerase II and H2A histone proteins were found to be enriched in HSCs by our analysis. Thus, we demonstrate a feasible and valid approach for metaanalysis of publicly available gene expression data that can yield further insights into human physiology and disease.
While microarray analysis of global gene expression yields enormous amounts of data, there are concerns about standardization and validity of findings. Consequently, we wanted to determine the variability in gene expression studies of human bone marrow in the literature and study the factors that account for these differences. We also wanted to determine if certain genes were consistently and differentially enriched in human bone marrow stem cells. A total of 64 individual datasets were collected from gene expression omnimbus (GEO) database for our analysis (2001–2006). Most of the datasets had been used as controls in studies of hematological malignancies. 13 datasets were hybridized to the Affymetrix U95 chip, 38 analyzed by the Affymetrix human U133A chip and 13 by the U133 plus 2.0 platform. RNA for these studies was derived from purified normal CD34+ cells in 48 cases and from unsorted normal bone marrow mononuclear cells in 16 cases. To merge data from different platforms, we converted individual probe Sequence_ids to RefSeq gene IDs and analyzed them by SAS (SAS Institute, Cary, NC) and Arrayassist software package (Stratagene©). A total of 23686 unique gene IDs were obtained for analysis after the data were normalized, and a KNN algorithm was used to fill the gaps in the data. Our results reveal that there is marked variability in gene expression patterns in this cohort. The data sets clustered together primarily on the basis of the laboratory that performed the assays. (Hierarchical clustering based on average Euclidean distances). Clustering was further defined by the type of chip/platform used for the analysis. Interestingly, the similarity between CD34+ sorted and ununsorted whole BM samples was greater than interplatform similarity between the same phenotypes of cells examined. Notwithstanding the variability in gene expression, there were a novel set of genes that were differentially enriched in all 64 samples. These genes included transcription factors (Kruppel like factor 6), translational proteins (eukaryotic translation initiation factor 4A, isoform 1, ribosomal proteins) and other proteins not previously implicated in hematopoeisis (guanine nucleotide binding protein (GNAS), Calnexin, HLA associated proteins, dUTP pryophosphatase etc.) Mouse homologues of several of these proteins were found to be overexpressed in a previous well respected study of mouse hematopoeitic stem cells (Ramalho-Santos et al, Science2002;298(5593)). To further validate these findings, we performed gene expression array analysis on primary bone marrow cells using a completely different platform (Nimblegen 37K arrays) and demonstrated enrichment of majority of these genes. Thus, we provide a blueprint for conducting similar meta-analysis across various microarray platforms and our findings disclose tremendous platform and lab dependant differences in microarray gene expression patterns. In spite of this variability, data mining of discrete datasets can be a useful tool for gene discovery. Finally, we are in the process of constructing a publicly searchable database of normal human bone marrow gene expression which may serve as a source of controls for gene expression studies of hematopoeitic malignancies by various investigators.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.