We wish to identify genes associated with disease. To do so, we look for novel genes whose expression patterns mimic those of known disease-associated genes, using a method we call Guilt-by-Association (GBA), on the basis of a combinatoric measure of association. Using GBA, we have examined the expression of 40,000 human genes in 522 cDNA libraries, and have discovered several hundred previously unidentified genes associated with cancer, inflammation, steroid-synthesis, insulin-synthesis, neurotransmitter processing, matrix remodeling, and other disease processes. The majority of the genes thus discovered show no sequence similarity to known genes, and thus could not have been identified by homology searches. We present here an example of the discovery of eight genes associated with prostate cancer. Of the 40,000 most-abundant human genes, these 8 are the most closely linked to the known diagnostic genes, and thus are prime targets for pharmaceutical research.
Databases of experimentally determined protein interactions provide information on binary interactions and on involvement in multiprotein complexes. These data are valuable for understanding the general properties of the interaction between proteins as well as for the development of prediction schemes for unknown interactions. Here we analyze experimentally determined protein interactions by measuring various sequence, genomic, transcriptomic, and proteomic attributes of each interacting pair in the yeast Saccharomyces cerevisiae. We find that dividing the data into two groups, one that includes binary interactions within protein complexes (stable) and another that includes binary interactions that are not within complexes (transient), enables better characterization of the interactions by the different attributes and improves the prediction of new interactions. This analysis revealed that most attributes were more indicative in the set of intracomplex interactions. Using this data set for training, we integrated the different attributes by logistic regression and developed a predictive scheme that distinguishes between interacting and noninteracting protein pairs. Analysis of the logistic-regression model showed that one of the strongest contributors to the discrimination between interacting and noninteracting pairs is the presence of distinct pairs of domain signatures that were suggested previously to characterize interacting proteins. The predictive algorithm succeeds in identifying both intracomplex and other interactions (possibly the more stable ones), and its correct identification rate is 2-fold higher than that of large-scale yeast two-hybrid experiments.domain signature ͉ genomewide analysis ͉ stable interaction ͉ transient interaction ͉ logistic regression P rotein interactions are central to almost all biological processes. Large-scale screens of protein-protein interactions (PPIs) in several organisms (1-4), together with PPI data from small-scale studies, have generated a large volume of experimental data that provides a partial picture of the cellular PPI networks. Previous studies that analyzed PPIs characterized their sequence domains and cellular properties (5-12) and provided insight into their evolution and regulation (13-23). At present, the richest information on PPIs is available for the yeast Saccharomyces cerevisiae, including documentation on experimentally determined binary interactions (1, 2, 24-26) as well as participation of proteins in the same complex (24,27,28). Intersection of these two data sources divides the binary interactions into those that occur within larger protein complexes [intracomplex interactions (ICIs)] and those that were not documented as belonging to complexes [non-intracomplex interactions (NICIs)]. The latter include interactions between proteins in different complexes, interactions between a noncomplexed protein and a protein in a complex, and interactions between two noncomplexed proteins (Fig. 1). A possible distinction between the ICIs and NICIs is the n...
BackgroundMany of the functional units in cells are multi-protein complexes such as RNA polymerase, the ribosome, and the proteasome. For such units to work together, one might expect a high level of regulation to enable co-appearance or repression of sets of complexes at the required time. However, this type of coordinated regulation between whole complexes is difficult to detect by existing methods for analyzing mRNA co-expression. We propose a new methodology that is able to detect such higher order relationships.ResultsWe detect coordinated regulation of multiple protein complexes using logic analysis of gene expression data. Specifically, we identify gene triplets composed of genes whose expression profiles are found to be related by various types of logic functions. In order to focus on complexes, we associate the members of a gene triplet with the distinct protein complexes to which they belong. In this way, we identify complexes related by specific kinds of regulatory relationships. For example, we may find that the transcription of complex C is increased only if the transcription of both complex A AND complex B is repressed. We identify hundreds of examples of coordinated regulation among complexes under various stress conditions. Many of these examples involve the ribosome. Some of our examples have been previously identified in the literature, while others are novel. One notable example is the relationship between the transcription of the ribosome, RNA polymerase and mannosyltransferase II, which is involved in N-linked glycan processing in the Golgi.ConclusionsThe analysis proposed here focuses on relationships among triplets of genes that are not evident when genes are examined in a pairwise fashion as in typical clustering methods. By grouping gene triplets, we are able to decipher coordinated regulation among sets of three complexes. Moreover, using all triplets that involve coordinated regulation with the ribosome, we derive a large network involving this essential cellular complex. In this network we find that all multi-protein complexes that belong to the same functional class are regulated in the same direction as a group (either induced or repressed).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.