Microarray experiments have been extensively used for simultaneously measuring DNA expression levels of thousands of genes in genome research. A key step in the analysis of gene expression data is the clustering of genes into groups that show similar expression values over a range of conditions. Since only a small subset of the genes participate in any cellular process of interest, by focusing on subsets of genes and conditions, we can lower the noise induced by other genes and conditions -a co-cluster characterizes such a subset of interest. Cheng and Church [3] introduced an effective measure of co-cluster quality based on mean squared residue. In this paper, we use two similar squared residue measures and propose two fast k-means like co-clustering algorithms corresponding to the two residue measures. Our algorithms discover k row clusters and l column clusters simultaneously while monotonically decreasing the respective squared residues. Our co-clustering algorithms inherit the simplicity, efficiency and wide applicability of the k-means algorithm. Minimizing the residues may also be formulated as trace optimization problems that allow us to obtain a spectral relaxation that we use for a principled initialization for our iterative algorithms. We further enhance our algorithms by an incremental local search strategy that helps avoid empty clusters and escape poor local minima. We illustrate co-clustering results on a yeast cell cycle dataset and a human B-cell lymphoma dataset. Our experiments show that our co-clustering algorithms are efficient and are able to discover coherent co-clusters.
BackgroundProkaryotic translation initiation involves the proper docking, anchoring, and accommodation of mRNA to the 30S ribosomal subunit. Three initiation factors (IF1, IF2, and IF3) and some ribosomal proteins mediate the assembly and activation of the translation initiation complex. Although the interaction between Shine-Dalgarno (SD) sequence and its complementary sequence in the 16S rRNA is important in initiation, some genes lacking an SD ribosome binding site (RBS) are still well expressed. The objective of this study is to examine the pattern of distribution and diversity of RBS in fully sequenced bacterial genomes. The following three hypotheses were tested: SD motifs are prevalent in bacterial genomes; all previously identified SD motifs are uniformly distributed across prokaryotes; and genes with specific cluster of orthologous gene (COG) functions differ in their use of SD motifs.ResultsData for 2,458 bacterial genomes, previously generated by Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm) and currently available at the National Center for Biotechnology Information (NCBI), were analyzed. Of the total genes examined, ~77.0 % use an SD RBS, while ~23.0 % have no RBS. Majority of the genes with the most common SD motifs are distributed in a manner that is representative of their abundance for each COG functional category, while motifs 13 (5′-GGA-3′/5′-GAG-3′/5′-AGG-3′) and 27 (5′-AGGAGG-3′) appear to be predominantly used by genes for information storage and processing, and translation and ribosome biogenesis, respectively.ConclusionThese findings suggest that an SD sequence is not obligatory for translation initiation; instead, other signals, such as the RBS spacer, may have an overarching influence on translation of mRNAs. Subsequent analyses of the 5′ secondary structure of these mRNAs may provide further insight into the translation initiation mechanism.
Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional "one-sided" clustering. We propose Robust Overlapping CoClustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications.
Protein phosphatase 2A (PP2A) is an abundant serine/threonine phosphatase that functions as a tumor suppressor in numerous cell-cell signaling pathways, including Wnt, myc, and ras. The B56 subunit of PP2A regulates its activity, and is encoded by five genes in humans. B56 proteins share a central core domain, but have divergent amino- and carboxy-termini, which are thought to provide isoform specificity. We performed phylogenetic analyses to better understand the evolution of the B56 gene family. We found that B56 was present as a single gene in eukaryotes prior to the divergence of animals, fungi, protists, and plants, and that B56 gene duplication prior to the divergence of protostomes and deuterostomes led to the origin of two B56 subfamilies, B56αβε and B56γδ. Further duplications led to three B56αβε genes and two B56γδ in vertebrates. Several nonvertebrate B56 gene names are based on distinct vertebrate isoform names, and would best be renamed. B56 subfamily genes lack significant divergence within primitive chordates, but each became distinct in complex vertebrates. Two vertebrate lineages have undergone B56 gene loss, Xenopus and Aves. In Xenopus, B56δ function may be compensated for by an alternatively spliced transcript, B56δ/γ, encoding a B56δ-like amino-terminal region and a B56γ core.
Aurora kinases (AKs) are serine/threonine kinases that are essential for cell division. Humans have three AK genes: AKA, AKB, and AKC. AKA is required for centrosome assembly, centrosome separation, and bipolar spindle assembly, and its mutation leads to abnormal spindle morphology. AKB is required for the spindle checkpoint and proper cytokinesis, and mutations cause chromosome misalignment and cytokinesis failure. AKC is expressed in germ cells, and has a role in meiosis analogous to that of AKB in mitosis. Mutation of any of the three isoforms can lead to cancer. AK proteins possess divergent N- and C-termini and a conserved central catalytic domain. We examined the evolution of the AK gene family using an identity matrix and by building a phylogenetic tree. The data suggest that AKA is the vertebrate ancestral gene, and that AKB and AKC resulted from gene duplication in placental mammals. In a nonsynonymous/synonymous rate substitution analysis, we found that AKB experienced the strongest, and AKC the weakest, purifying selection. Both the N- and C-termini and regions within the kinase domain experienced differential selection among the AK isoforms. These differentially selected sequences may be important for species specificity and isoform specificity, and are therefore potential therapeutic targets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.