Fused Regression for Multi-source Gene Regulatory Network Inference

Lam, Kari Y.; Westrick, Zachary M.; Müller, Christian L.; Christiaen, Lionel; Bonneau, Richard

doi:10.1371/journal.pcbi.1005157

Cited by 41 publications

(34 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, normalizing batches to equal transcript depth risks suppressing differences which are true biological variability. An alternative approach is to treat the cells from each environmental condition as separate tasks, and then jointly learn a network using a multitask learning (MTL) framework Lam et al, 2016) . We find that our multi-task network inference (MTL) procedure , which we named Adaptive Multiple Sparse Regression (AMuSR), improves the quality of the network inference and increases the size of the network recovered ( Figure 5D).…”

Section: Multi-task Learning Improves Network Inference and Enables Rmentioning

confidence: 99%

Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments

Jackson

Castro

Saldi

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for transcriptionally barcoding gene deletion mutants and performing scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse genotypes in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We developed, and benchmarked, a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,018 interactions. Our study establishes a general approach to gene regulatory network reconstruction from scRNAseq data that can be employed in any organism.

show abstract

Section: Multi-task Learning Improves Network Inference and Enables Rmentioning

confidence: 99%

Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments

Jackson

Castro

Saldi

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…where the first term is the basic NCA model [10] (A ∈ R m×l is the TF activity for m TFs in l samples) and the second and third terms are standard regularization terms and the last term involving 0 norm that is able to induce sparsity of the given prior network. Therefore, solving (6) would yield a refined GRN that only retains key edges from the prior network. The details of the sparse NCA-based network remodelling model is illustrated in Fig.…”

Section: Netrex-cf Modelmentioning

confidence: 99%

“…The first inequality comes from the fact that ∇H is Lipschitz continuous on bounded subset R n ×R m as assumed in Assumption 1 (6). The optimality condition for (27), we have…”

Section: B2 Convergence Analysismentioning

confidence: 99%

“…Indeed, inference of network edges based solely on gene expression data is challenging; network reconstruction uses an enormous search space, and the underlying biology is multilayered with many factors including post-transcriptional and post-translation regulation contributing to TF's activity. We and others have found that network accuracy is drastically improved by including additional biological data such as chromatin structure (i.e., ATAC-Seq and ChIP-Seq), TF DNA binding motifs, and DNA sequence conservation scores [3,4,5,2,6,7].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reconstruction of Gene Regulatory Networks by integrating biological model and a recommendation system

Wang

Fear²,

Berger³

et al. 2020

Preprint

View full text Add to dashboard Cite

Gene Regulatory Networks (GRNs) control many aspects of cellular processes including cell differentiation, maintenance of cell type specific states, signal transduction, and response to stress. Since GRNs provide information that is essential for understanding cell function, the inference of these networks is one of the key challenges in systems biology. Leading algorithms to reconstruct GRN utilize, in addition to gene expression data, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of DNA binding experiments. However, such prior knowledge is typically incomplete hence resulting in missing values. Therefore, the integration of such incomplete prior knowledge with gene expression to elucidate the underlying GRNs remains difficult.To address this challenge we introduce NetREX-CF -Regulatory Network Reconstruction using EXpression and Collaborative Filtering -a GRN reconstruction approach that brings together a modern machine learning strategy (Collaborative Filtering model) and a biologically justified model of gene expression (sparse Network Component Analysis based model). The Collaborative Filtering (CF) model is able to overcome the incompleteness of the prior knowledge and make edge recommends for building the GRN. Complementing CF, the sparse Network Component Analysis (NCA) model can use gene expression data to validate the recommended edges. Here we combine these two approaches using a novel data integration method and show that the new approach outperforms the currently leading GRN reconstruction methods.Furthermore, our mathematical formalization of the model has lead to a complex optimization problem of a type that has not been attempted before. Specifically, the formulation contains 0 norm that can not be separated from other variables. To fill this gap, we introduce here a new method Generalized PALM (GPALM) that allows us to solve a broad class of non-convex optimization problems and prove its convergence. NetREX-CF -Method OverviewThe NetREX-CF model is a novel data integration framework for reconstructing GRNs by organically utilizing both gene expression E and a set of prior networks P = {P 1 , ...P d }. The main idea behind the NetREX-CF model is an integration of two complementary optimization strategies: (i) a machine learning component designed based on Collaborative Filtering that is able to identify hidden features from the current observed prior networks P and utilize these features to recommend an improved GRN and (ii) a sparse NCA-based network remodelling component that can refine the topology of a GRN based on given gene expression E. These two computational components operate alternatively. The CF component recommends new edges to the current GRN and the sparse NCA-based network remodelling component screens the recommended edges and keeps the edges that are essential to explain the given gene expression. Once the sparse NCA-based network remodelling component confirms some of the recommended edges, the CF component further utilizes those retained recomm...

show abstract

“…In 38 these methods [17,21], priors or constraints on network structure (derived from multiple 39 sources like known interactions, ATAC-seq, DHS, or ChIP-seq experiments [22][23][24]) are 40 used to influence the penalty on adding model components, where edges in the prior are 41 effectively penalized less. Here we describe a method that builds on that work (and similar 42 work in other fields), but in addition we let model inference processes (each carried out 43 using a separate data-set) influence each others model penalties, so that edges that agree 44 across inference tasks are more likely to be uncovered [25][26][27][28][29][30][31]. Several previous works on 45 this front focused on enforcing similarity across models by penalizing differences on 46 strength and direction of regulatory interactions using a fusion penalty [25,27,28].…”

mentioning

confidence: 99%

Multi-study inference of regulatory networks for more accurate models of gene regulation

Castro¹,

Veaux

Miraldi

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets. 19 methods are not applicable when integrating public data from multiple sources with widely 20 differing experimental designs. 21 In network inference, an approach often taken to bypass batch effects is to learn models 22 from each dataset separately and combine the resulting networks [16,17]. Known as 23 ensemble learning, this idea of synthesizing several weaker models into a stronger 24 aggregate model is commonly used in machine learning to prevent overfitting and build 25 more generalizable prediction models [18]. In several scenarios, ensemble learning avoids 26 introducing additional artifacts and complexity that may be introduced by explicitly 27 2 modeling batch effects. On the other hand, the relative sample size of each dataset is 28 smaller when using ensemble methods, likely decreasing the ability of an algorithm to 29 detect relevant interactions. As regulatory networks are highly context-dependent [19], for 30 example, TF binding to several promoters is condition-specific [20], a drawback for both 31 batch-correction and ensemble methods is that they produce a single network model to 32 explain the data across datasets. Relevant dataset-specific interactions might not be 33 recovered, or just difficult to tell apart using a single model. 34 Although it will not be the primary focus of this paper, most modern network inference 35 algorithms integrate multiple data-types to derive prior or constraints on net...

show abstract

Fused Regression for Multi-source Gene Regulatory Network Inference

Cited by 41 publications

References 54 publications

Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments

Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments

Reconstruction of Gene Regulatory Networks by integrating biological model and a recommendation system

Multi-study inference of regulatory networks for more accurate models of gene regulation

Contact Info

Product

Resources

About