The CRASSS plug-in for integrating annotation data with hierarchical clustering results

Buehler, Eugen; Sachs, Jeffrey R.; Shao, Kui; Bagchi, Ann D.; Ungar, Lyle H.

doi:10.1093/bioinformatics/bth362

Cited by 12 publications

(12 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To our knowledge, in the whole bioinformatics literature there have been only four recent studies in which it has been attempted to establish general methods to compare hierarchical and non-hierarchical classifications, all of them in the context of microarray data analysis. In two of these studies [ 15 , 16 ], the method is similar, and very much related to those used for the two simpler cases discussed above and exemplified in Figures 1A and 1B . Starting with a hierarchical classification of expression data, which may be obtained with any conventional method, such as UPGMA, the degree of enrichment for a particular class (i.e.…”

Section: Introductionmentioning

confidence: 99%

“…The process is repeated until all non-overlapping clusters with small p values are determined. Finally, Bonferroni's correction is used to take into account the effect of multiple tests either considering the number of classes tested [ 15 ] or the number of clusters tested [ 16 ]. A third study followed the same strategy, but only up to the calculation of the p values, without further refinement of the results [ 17 ].…”

Section: Introductionmentioning

confidence: 99%

“…Although we determined that the results obtained in those two works were biologically meaningful, an obvious question to be solved was to establish a standard procedure to determine whether the hierarchical classification obtained was congruent with other classifications (such as GO, division in protein complexes, etc). In this work, we describe a method that follows on the steps of previous studies [ 15 , 16 ], but improves the characterization of the significant classes by using permutation tests that take into account the topology of the hierarchical classification. The method is applied to several cases and, most especially, to explore a hierarchical representation of the mitochondrial interactome, characterizing the clusters that correspond to known protein complexes.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification

Marco

Marı́n

2007

BMC Bioinformatics

View full text Add to dashboard Cite

Background: Classification procedures are widely used in phylogenetic inference, the analysis of expression profiles, the study of biological networks, etc. Many algorithms have been proposed to establish the similarity between two different classifications of the same elements. However, methods to determine significant coincidences between hierarchical and non-hierarchical partitions are still poorly developed, in spite of the fact that the search for such coincidences is implicit in many analyses of massive data.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification

Marco

Marı́n

2007

BMC Bioinformatics

View full text Add to dashboard Cite

show abstract

“…Traditionally, the known annotations are used only as a second step, after data have been clustered according to their variation patterns. Only those clusters in which many genes (and proteins/metabolites) are annotated within the same category (for example, the same MapMan BIN [14] or Gene Ontology (GO) terms [15]), are then selected for further analysis [16-19]. For each pattern, its annotations and memberships to well-known metabolic pathways are generally assessed.…”

Section: Introductionmentioning

confidence: 99%

Improving clustering with metabolic pathway data

et al. 2014

View full text Add to dashboard Cite

BackgroundIt is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters.ResultsA novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view.ConclusionsAnalyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.

show abstract

“…Only those clusters in which many genes are annotated with the same annotation (e.g. the same biological process), are then selected for further analysis (Buehler, 2004; Curtis, 2005; Doherty, 2006; Toronen, 2004; and others). Fang et al (2006) took the opposite approach, first mapping the genes involved in the expression dataset to the GO hierarchy, and then looking only at those GO terms for which the mapped genes show high expression similarity.…”

Section: Introductionmentioning

confidence: 99%

Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering

2009

View full text Add to dashboard Cite

Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity.Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein–protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein–protein interaction data.Contact: dotna@cs.bgu.ac.ilSupplementary information: Supplementary data are available at Bioinformatics online.

show abstract

The CRASSS plug-in for integrating annotation data with hierarchical clustering results

Cited by 12 publications

References 12 publications

A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification

A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification

Improving clustering with metabolic pathway data

Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering

Contact Info

Product

Resources

About