Multi-way clustering of microarray data using probabilistic sparse matrix factorization

Dueck, Delbert; Morris, Quaid; Frey, Brendan J.

doi:10.1093/bioinformatics/bti1041

Cited by 58 publications

(36 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lower-dimensional projections and decompositions of DNA microarray data, such as principal component analysis, singular value decomposition, and NMF, have been used to analyze transcriptional states (3,(33)(34)(35)(36)(37). Primarily, these approaches were applied in the context of a single data set for clustering or visualization.…”

Section: Discussionmentioning

confidence: 99%

Metagene projection for cross-platform, cross-species characterization of global transcriptional states

Tamayo

Scanfeld

Ebert

et al. 2007

Proc. Natl. Acad. Sci. U.S.A.

126

116

View full text Add to dashboard Cite

The high dimensionality of global transcription profiles, the expression level of 20,000 genes in a much small number of samples, presents challenges that affect the sensitivity and general applicability of analysis results. In principle, it would be better to describe the data in terms of a small number of metagenes, positive linear combinations of genes, which could reduce noise while still capturing the invariant biological features of the data. Here, we describe how to accomplish such a reduction in dimension by a metagene projection methodology, which can greatly reduce the number of features used to characterize microarray data. We show, in applications to the analysis of leukemia and lung cancer data sets, how this approach can help assess and interpret similarities and differences between independent data sets, enable crossplatform and cross-species analysis, improve clustering and class prediction, and provide a computational means to detect and remove sample contamination.cancer ͉ dimension reduction ͉ expression analysis ͉ noise reduction ͉ sample contamination A major challenge in the analysis of global transcription profiles is the high level of noise and the lack of reproducibility across data sets, which results from fitting models to small numbers of samples in a high-dimensional space (i.e., thousands of genes). Ideally we would prefer to reduce the data to a small number of metagenes that better capture the essential behavior of the samples.There are many advantages to such a metagene approach. By capturing the major, invariant biological features and reducing noise, metagenes provide descriptions of data sets that allow them to be more easily combined and compared. This is especially important when we are considering cross-platform or cross-species data. Ultimately, this can result in more sensitive clustering and classification. In addition, interpretation of the metagenes, which characterize a subtype or subset of samples, can give us insight into underlying mechanisms and processes of a disease.Here, we describe a general methodology, metagene projection, that creates a low-dimensional representation of a training (model) data set using nonnegative metagene factors into which an independently obtained new (test) set of samples or data can be projected and analyzed. The metagene factors are a small number of gene combinations that distinguish expression patterns of subclasses in a data set. We obtain the factors by the application of nonnegative matrix factorization (NMF) (1, 2) used to extract facial features from images. We showed (3) how NMF can extract metagenes that provide stable, robust clustering of expression data. Moreover, by using gene set enrichment analysis (GSEA) to annotate the metagene factors themselves, we can gain insight into the underlying biology of both the training and test data sets.Importantly, we illustrate the utility of metagene projection by its application to leukemia and lung cancer data sets. We show how the projection of new data sets into the space of meta...

show abstract

Section: Discussionmentioning

confidence: 99%

Metagene projection for cross-platform, cross-species characterization of global transcriptional states

Tamayo

Scanfeld

Ebert

et al. 2007

Proc. Natl. Acad. Sci. U.S.A.

126

116

View full text Add to dashboard Cite

show abstract

“…Although the results presented in [21] show that the computed NMF generated parts-based basis vectors, the generation of a parts-based basis by the NMF depends on the data and the algorithm [14,23]. Several approaches [7,14,29,30] have been proposed to explicitly control the degree of sparseness in the factors of the NMF. In this section, we propose algorithms for the sparse NMF that follows the framework of the two block coordinate descent methods and therefore guarantees that every limit point is a stationary point.…”

Section: Algorithms For Sparse Nmf Based On Alternating Non-negativitmentioning

confidence: 99%

Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method

Kim

Park²

2008

SIAM J. Matrix Anal. & Appl.

509

343

View full text Add to dashboard Cite

Abstract.The non-negative matrix factorization (NMF) determines a lower rank approximation of a matrixis given and nonnegativity is imposed on all components of the factors £ 7 ¥ 8 § © @ 9and £ 7 ¥ A 9 B © @. The NMF has attracted much attention for over a decade and has been successfully applied to numerous data analysis problems. In applications where the components of the data are necessarily nonnegative such as chemical concentrations in experimental results or pixels in digital images, the NMF provides a more relevant interpretation of the results since it gives non-subtractive combinations of non-negative basis vectors. In this paper, we introduce an algorithm for the NMF based on alternating non-negativity constrained least squares (NMF/ANLS) and the active set based fast algorithm for non-negativity constrained least squares with multiple right hand side vectors, and discuss its convergence properties and a rigorous convergence criterion based on the Karush-Kuhn-Tucker (KKT) conditions. In addition, we also describe algorithms for sparse NMFs and regularized NMF. We show how we impose a sparsity constraint on one of the factors by C E D -norm minimization and discuss its convergence properties. Our algorithms are compared to other commonly used NMF algorithms in the literature on several test data sets in terms of their convergence behavior.

show abstract

“…The mean vectors e r of this truncated MGD are provided by an EEA dedicated to hyperspectral imagery and the variances s 2 r are fixed to a large value. To summarize, the prior for t r is t r ∼ N Tr e r , s 2 r I R−1 (5) where N Tr e r , s 2 r I R−1 denotes the truncated MGD with mean e r and covariance matrix s 2 r I R−1 . Fig.…”

Section: Unsupervised Bayesian Linear Unmixingmentioning

confidence: 99%

“…These methods include non-negative matrix factorization (NMF) [3], independent component analysis (ICA) [4], bi-clustering [5], PCA, penalized matrix decomposition (PMD) [2], and Bayesian factor regression modeling (BFRM) [1]. Contrary to BLU, the PCA, ICA, BFRM, bi-clustering and PMD methods do not account for nonnegativity of the factor loadings and factor scores.…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Bayesian analysis of gene expression patterns

Bazot

Dobigeon

Tourneret

et al. 2010

2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers

View full text Add to dashboard Cite

In this paper we introduce a new method for analyzing expression patterns from high throughput and complex data such as gene expression microarrays. These microarrays are collected under different conditions such as time, phenotype and treatment. The proposed method uses a Bayesian matrix decomposition, called Bayesian linear unmixing (BLU), to extract a set of characteristic gene signatures, or factors, and a set of coefficients, factor scores, that specify the relative contribution of each signature to a specific sample. BLU is related to Bayesian factor analysis but differs in an important respect: BLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Thus BLU reduces the multiplexing of genes into different factors and can enhance interpretability of the factor loadings and factor scores. The unsupervised version of BLU presented in this paper also provides estimates of the number of factors. We illustrate the application of BLU to bioinformatics by analyzing gene expression microarray datasets.

show abstract

Multi-way clustering of microarray data using probabilistic sparse matrix factorization

Cited by 58 publications

References 12 publications

Metagene projection for cross-platform, cross-species characterization of global transcriptional states

Metagene projection for cross-platform, cross-species characterization of global transcriptional states

Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method

Unsupervised Bayesian analysis of gene expression patterns

Contact Info

Product

Resources

About