A data integration methodology for systems biology: Experimental verification

Hwang, Daehee; Smith, Jennifer J.; Leslie, Deena M.; Weston, Andrea D.; Rust, Alistair G.; Ramsey, Stephen A.; Atauri, Pedro de; Siegel, Andrew F.; Bolouri, Hamid; Aitchison, John D.; Hood, Leroy

doi:10.1073/pnas.0508649102

Cited by 122 publications

(87 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The applicability of our methodology to different types and sizes of data and to different numbers of data sets is demonstrated by application to five different types of data integration in a companion paper (10). Although we focused here on presenting our methodology from the perspective of maximizing statistical power, it can also be applied to scenarios for which the different types of data being integrated have systematic differences between them, for example, combining mRNA and protein abundance measurements or in vivo and in vitro measurements.…”

Section: Discussionmentioning

confidence: 99%

“…Although we focused here on presenting our methodology from the perspective of maximizing statistical power, it can also be applied to scenarios for which the different types of data being integrated have systematic differences between them, for example, combining mRNA and protein abundance measurements or in vivo and in vitro measurements. Examples of this type of integration are given in the companion paper (10). Data integration can never rule out inclusion of some false positives or loss of some true positives.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

A data integration methodology for systems biology

Hwang

Rust²,

Ramsey³

et al. 2005

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

302

246

View full text Add to dashboard Cite

Different experimental technologies measure different aspects of a system and to differing depth and breadth. High-throughput assays have inherently high false-positive and false-negative rates. Moreover, each technology includes systematic biases of a different nature. These differences make network reconstruction from multiple data sets difficult and error-prone. Additionally, because of the rapid rate of progress in biotechnology, there is usually no curated exemplar data set from which one might estimate data integration parameters. To address these concerns, we have developed data integration methods that can handle multiple data sets differing in statistical power, type, size, and network coverage without requiring a curated training data set. Our methodology is general in purpose and may be applied to integrate data from any existing and future technologies. Here we outline our methods and then demonstrate their performance by applying them to simulated data sets. The results show that these methods select truepositive data elements much more accurately than classical approaches. In an accompanying companion paper, we demonstrate the applicability of our approach to biological data. We have integrated our methodology into a free open source software package named POINTILLIST.Fisher's method ͉ mixture distribution models S ystems biology (1, 2) aims to understand cellular behavior in terms of the spatiotemporal interactions among cellular components, such as genes, proteins, metabolites, and organelles. In systems biology, one typically perturbs a system and, with highthroughput measurements to identify all pertinent elements and their interactions, integrates them into a biological network to understand the system's behavior. As such, systems biology is predicated on the integration of experimental data from an ever increasing number of technologies, such as gene expression arrays, proteomics, and chromatin immunoprecipitation on chip assays (3). Integration achieves one of the most important imperatives of systems biology, namely it reduces the dimensionality of global data to deliver useful information about the system of interest.A major challenge in systems biology is that technologies that globally interrogate biological systems have inherently high falsepositive and false-negative rates (4); thus, each data type alone has a limited utility. The integration of data from different sources provides an effective means to deal with this issue by reinforcing bona fide observations and reducing false negatives. Moreover, because different experimental technologies provide different insights into a system, the integration of multiple data types offers the greatest information about a particular cellular process. For example, gene perturbation experiments (e.g., knockouts or RNA interference) reveal relationships between genes that may imply direct physical interactions or indirect logical interactions. In contrast, chromatin immunoprecipitation chip data can reveal direct protein-DNA interactions or cofacto...

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

A data integration methodology for systems biology

Hwang

Rust²,

Ramsey³

et al. 2005

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

302

246

View full text Add to dashboard Cite

show abstract

“…There are three main categories to construct biomolecular regulatory networks: gene regulatory networks through a mathematical model; networks through literature mining, and integrating multiple data (Friedman et al, 2000;Hwang et al, 2005;Mohamed-Hussein and Harun, 2009). Building a network through literature mining means using bioinformatics, computational biology, and other tools of computer science to analyze the data in the literature, and build biomolecular regulatory networks using the relationships between gene/protein interactions of the existing literature.…”

Section: A B Discussionmentioning

confidence: 99%

Construction of gene/protein interaction networks for primary myelofibrosis and KEGG pathway-enrichment analysis of molecular compounds

Sun¹,

Cao²,

Zhou³

et al. 2015

Genet. Mol. Res.

View full text Add to dashboard Cite

ABSTRACT. The objective of this study was the development of a gene/ protein interaction network for primary myelofibrosis based on gene expression, and the enrichment analysis of KEGG pathways underlying the molecular complexes in this network. To achieve this, genes involved in primary myelofibrosis were selected from the OMIM database. A gene/ protein interaction network for primary myelofibrosis was obtained through Cytoscape with the literature mining performed using the Agilent Literature Search plugin. The molecular complexes in the network were detected by ClusterViz plugin and KEGG pathway enrichment of molecular complexes was performed using DAVID online. We found 75 genes associated with primary myelofibrosis in the OMIM database. The gene/protein interaction network of primary myelofibrosis contained 608 nodes, 2086 edges, and 16127 Gene/protein interaction networks and enrichment analysis ©FUNPEC-RP www.funpecrp.com.br Genetics and Molecular Research 14 (4): 16126-16132 (2015) 4 molecular complexes with a correlation integral value greater than 4. Molecular complexes involved in KEGG pathways are related to cytokine regulation, immune function regulation, ECM-receptor interaction, focal adhesion, actin cytoskeleton regulation, cell adhesion molecules, and other biological behavior of tumors, which can provide a reliable direction for the treatment of primary myelofibrosis and the bioinformatic foundation for further understanding the molecular mechanisms of this disease.

show abstract

“…Hwang et al (2005a) have also recognized the issues in data integration, when the different data types vary in size, confidence, and network coverage and have developed a general algorithm based on advanced statistical techniques to better interpret the data. These authors used the developed approach to analyze 18 data sets including mRNA, protein levels, protein-DNA interaction data, and protein-protein interaction data related to galactose utilization in S. cerevisiae (Hwang et al, 2005b) and identified 69 genes that were perturbed significantly in the data sets. Additionally, the analysis suggested that fructose metabolism would be down regulated in the presence of galactose via the downregulation of a hexose transporter, a hypothesis that was experimentally verified through the measurement of corresponding protein levels.…”

Section: Systems-level Data Analysis and Miningmentioning

confidence: 99%

Systems Biology: The synergistic interplay between biology and mathematics

Dhurjati

Mahadevan

2008

Can J Chem Eng

View full text Add to dashboard Cite

Systems Biology is a nascent field that arose from the technology driven omics measurement revolution. It goes beyond mere data analysis and focuses on the biological behaviour emerging from the dynamic interactions between system components that are organized in a hierarchical and highly connected manner. Mathematical models have been used as a conceptual framework to study such systems and their impact is maximal when there is a synergistic interplay of the models with experimental data and biological domain knowledge. The review provides an introduction to the modelling process and selectively highlights manuscripts that have strong biological impact.La biologie des systèmes est un domaine qui est apparu récemment avec la révolution dans les mesures omiques dues aux progrès technologiques. Ce domaine va au-delà de la simple analyse et s'intéresse au comportement biologique venant des interactions dynamiques entre les composantes de systèmes qui sont organisées de façon hiérarchique et hautement interreliées. Des modèles mathématiques ontété utilisés comme cadre conceptuel afin d'étudier ces systèmes, et leur impact est maximal lorsqu'il y a une interrelation synergétique entre d'une part, les modèles, et d'autre part, les données expérimentales et les connaissances dans le domaine biologique. Cette revue de la littérature fournit une introduction aux articles mettant en lumière les procédés de modélisation et la sélectivité et qui ont un fort impact biologique.

show abstract

A data integration methodology for systems biology: Experimental verification

Cited by 122 publications

References 34 publications

A data integration methodology for systems biology

A data integration methodology for systems biology

Construction of gene/protein interaction networks for primary myelofibrosis and KEGG pathway-enrichment analysis of molecular compounds

Systems Biology: The synergistic interplay between biology and mathematics

Contact Info

Product

Resources

About