We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
A general and detailed noise model for the DNA microarray measurement of gene expression is presented and used to derive a Bayesian estimation scheme for expression ratios, implemented in a program called PFOLD, which provides not only an estimate of the fold-change in gene expression, but also con dence limits for the change and a P-value quantifying the signi cance of the change. Although the focus is on oligonucleotide microarray technologies, the scheme can also be applied to cDNA based technologies if parameters for the noise model are provided. The model uni es estimation for all signals in that it provides a seamless transition from very low to very high signal-to-nois e ratios, an essential feature for current microarray technologies for which the median signal-to-noise ratios are always moderate. The dual use, as decision statistics in a two-dimensional space, of the P-value and the fold-change is shown to be effective in the ubiquitous problem of detecting changing genes against a background of unchanging genes, leading to markedly higher sensitivities, at equal selectivity, than detection and selection based on the fold-change alone, a current practice until now.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.