Probabilistic Boolean Networks (PBNs) were introduced as a computational model for studying gene interactions in Gene Regulatory Networks (GRNs). Controllability of PBNs, and hence GRNs, is the process of making strategic interventions to a network in order to drive it from a particular state towards some other potentially more desirable state. This is of significant importance to systems biology as successful control could be used to obtain potential gene treatments by making therapeutic interventions. Recent advancements in Deep Reinforcement Learning have enabled systems to develop policies merely by interacting with the environment, without complete knowledge of the underlying Markov Decision Process (MDP). In this paper we propose the use of a Deep Q Network with Double Q Learning, that directly interacts with the environment -that is, a Probabilistic Boolean Network. The proposed approach is trained by sampling experiences obtained from the environment using Prioritised Experience Replay and successfully determines a control policy that directs a PBN from any state to the desired state (attractor). We demonstrate successful results on significantly larger PBNs compared to previous approaches under our control framework.
In this paper we describe the application of a Deep Reinforcement Learning agent to the problem of control of Gene Regulatory Networks (GRNs). The proposed approach is applied to Random Boolean Networks (RBNs) which have extensively been used as a computational model for GRNs. The ability to control GRNs is central to therapeutic interventions for diseases such as cancer. That is, learning to make such interventions as to direct the GRN from some initial state towards a desired attractor, by allowing at most one intervention per time step. Our agent interacts directly with the environment; being an RBN, without any knowledge of the underlying dynamics, structure or connectivity of the network. We have implemented a Deep Q Network with Double Q Learning that is trained by sampling experiences from the environment using Prioritized Experience Replay. We show that the proposed novel approach develops a policy that successfully learns how to control RBNs significantly larger than previous learning implementations. We also discuss why learning to control an RBN with zero knowledge of its underlying dynamics is important and argue that the agent is encouraged to discover and perform optimal control interventions in regard to cost and number of interventions.
An ensemble inference mechanism is proposed on the Angry Birds domain. It is based on an efficient tree structure for encoding and representing game screenshots, where it exploits its enhanced modeling capability. This has the advantage to establish an informative feature space and modify the task of game playing to a regression analysis problem. To this direction, we assume that each type of object material and bird pair has its own Bayesian linear regression model. In this way, a multi-model regression framework is designed that simultaneously calculates the conditional expectations of several objects and makes a target decision through an ensemble of regression models. The learning procedure is performed according to an online estimation strategy for the model parameters. We provide comparative experimental results on several game levels that empirically illustrate the efficiency of the proposed methodology.
Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a cosine distance cost defined in an adversarially learned feature space. This leads to a highly discriminative critic network and optimal transport plan that subsequently guide imitation learning. We evaluate the proposed approach using both the reward metric and the Sinkhorn distance metric on a number of MuJoCo experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.