Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets. 19 methods are not applicable when integrating public data from multiple sources with widely 20 differing experimental designs. 21 In network inference, an approach often taken to bypass batch effects is to learn models 22 from each dataset separately and combine the resulting networks [16,17]. Known as 23 ensemble learning, this idea of synthesizing several weaker models into a stronger 24 aggregate model is commonly used in machine learning to prevent overfitting and build 25 more generalizable prediction models [18]. In several scenarios, ensemble learning avoids 26 introducing additional artifacts and complexity that may be introduced by explicitly 27 2 modeling batch effects. On the other hand, the relative sample size of each dataset is 28 smaller when using ensemble methods, likely decreasing the ability of an algorithm to 29 detect relevant interactions. As regulatory networks are highly context-dependent [19], for 30 example, TF binding to several promoters is condition-specific [20], a drawback for both 31 batch-correction and ensemble methods is that they produce a single network model to 32 explain the data across datasets. Relevant dataset-specific interactions might not be 33 recovered, or just difficult to tell apart using a single model. 34 Although it will not be the primary focus of this paper, most modern network inference 35 algorithms integrate multiple data-types to derive prior or constraints on net...