BackgroundGenome-wide time-series data provide a rich set of information for discovering gene regulatory relationships. As genome-wide data for mammalian systems are being generated, it is critical to develop network inference methods that can handle tens of thousands of genes efficiently, provide a systematic framework for the integration of multiple data sources, and yield robust, accurate and compact gene-to-gene relationships.ResultsWe developed and applied ScanBMA, a Bayesian inference method that incorporates external information to improve the accuracy of the inferred network. In particular, we developed a new strategy to efficiently search the model space, applied data transformations to reduce the effect of spurious relationships, and adopted the g-prior to guide the search for candidate regulators. Our method is highly computationally efficient, thus addressing the scalability issue with network inference. The method is implemented as the ScanBMA function in the networkBMA Bioconductor software package.ConclusionsWe compared ScanBMA to other popular methods using time series yeast data as well as time-series simulated data from the DREAM competition. We found that ScanBMA produced more compact networks with a greater proportion of true positives than the competing methods. Specifically, ScanBMA generally produced more favorable areas under the Receiver-Operating Characteristic and Precision-Recall curves than other regression-based methods and mutual-information based methods. In addition, ScanBMA is competitive with other network inference methods in terms of running time.
Conflict of interest: EW receives a salary from Celgene, Inc. JMO receives a salary from Gilead Sciences, Inc. MKL has received research support from TxCell, Pfizer, and Bristol Myers Squibb and has patents pending related to alloantigen-specific chimeric antigen receptors (PCT/CA2018/051167 and PCT/CA2018/051174). PSL receives research support from Bristol Myers Squibb and is an inventor on patent US5844095A, "CTLA4Ig fusion proteins." RG has received consulting income from Juno Therapeutics, Takeda, Infotech Soft, and Celgene, Inc., and has received research support from Janssen Pharmaceuticals and Juno Therapeutics.
The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution, leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.
BACKGROUND:Inferring genetic networks from genome-wide expression data is extremely demanding computationally.We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a novel and computationally efficient method for eliminating redundant indirect edges in the network. FINDINGS:We evaluated the performance of fastBMA on synthetic data and experimental genome-wide yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory efficient, parallel and distributed application that scales to human genome wide expression data. A 10,000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster. CONCLUSIONS:fastBMA is a significant improvement over its predecessor ScanBMA. It is orders of magnitude faster and more accurate than other fast network inference methods such as LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable timeframe. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA . CC-BY 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/099036 doi: bioRxiv preprint first posted online Jan. 6, 2017; Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).. CC-BY 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/099036 doi: bioRxiv preprint first posted online Jan. 6, 2017; The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/099036 doi: bioRxiv preprint first posted online Jan. 6, 2017; Findings BACKGROUNDGenetic regulatory networks capture the complex relationships between biological entities which help us to identify putative driver and passenger genes in various diseases [1,2]. Many approaches have been proposed to infer genetic networks using gene expression data, for example, co-expression networks [3], mutual information-based methods [4,5], Bayesian networks [6][7][8], ordinary differential equations [9,10], regression-based methods [11][12][13][14] and ensemble methods [15]. In addition, methods have been proposed to infer gene networks using multiple data sources, e.g. [16][17][18][19]. Our ContributionsWe have previously described ScanBMA [14], an implementation of Bayesian model averaging (BMA) [20] for inferring regulatory networks. ScanBMA is available from the "networkBMA" Bioconductor package [21], written in R and C++. It has been shown that ScanBMA generates comp...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.