An Introduction to Deep
                  Reinforcement Learning

François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joëlle

doi:10.1561/9781680835397

Cited by 393 publications

(274 citation statements)

References 0 publications

Supporting

Mentioning

273

Contrasting

Unclassified

Order By: Relevance

“…Hence, a policy is commonly represented by a function approximator to overcome this difficulty [60]. The combination of RL and deep leaning (called deep RL [62]) has been successful in handling large-scale complicated tasks by using deep neural networks (DNNs) as function approximators [63], [64], but it is at the expense of complexity. Therefore, this study still considers a RL method and uses a simple neural network (NN) with one fully connected hidden layer to represent the policy (called policy network), as shown in Fig.…”

Section: Reinforcement Learning Based On-mtpmentioning

confidence: 99%

“…The first layer (called input layer) is given the input values, where all the images in the state space are collected as input to this NN. The values of the middle layer (called hidden layer) are a transformation of the input values by a non-linear parametric function [62]. The last layer (called output layer) provides the output values transformed from the hidden layer, which can output an action deciding to accept or reject a new TR.…”

Section: Reinforcement Learning Based On-mtpmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Tenant Provisioning for Quantum Key Distribution Networks With Heuristics and Reinforcement Learning: A Comparative Study

Cao

Zhao

et al. 2020

IEEE Trans. Netw. Serv. Manage.

View full text Add to dashboard Cite

Quantum key distribution (QKD) networks are potential to be widely deployed in the immediate future to provide long-term security for data communications. Given the high price and complexity, multi-tenancy has become a cost-effective pattern for QKD network operations. In this work, we concentrate on addressing the online multi-tenant provisioning (On-MTP) problem for QKD networks, where multiple tenant requests (TRs) arrive dynamically. On-MTP involves scheduling multiple TRs and assigning non-reusable secret keys derived from a QKD network to multiple TRs, where each TR can be regarded as a high-security-demand organization with the dedicated secret-key demand. The quantum key pools (QKPs) are constructed over QKD network infrastructure to improve management efficiency for secret keys. We model the secret-key resources for QKPs and the secret-key demands of TRs using distinct images. To realize efficient On-MTP, we perform a comparative study of heuristics and reinforcement learning (RL) based On-MTP solutions, where three heuristics (i.e., random, fit, and best-fit based On-MTP algorithms) are presented and a RL framework is introduced to realize automatic training of an On-MTP algorithm. The comparative results indicate that with sufficient training iterations the RL-based On-MTP algorithm significantly outperforms the presented heuristics in terms of tenant-request blocking probability and secret-key resource utilization.Index Terms-Quantum key distribution networks, online multi-tenant provisioning, heuristics, reinforcement learning.

show abstract

Section: Reinforcement Learning Based On-mtpmentioning

confidence: 99%

Section: Reinforcement Learning Based On-mtpmentioning

confidence: 99%

Multi-Tenant Provisioning for Quantum Key Distribution Networks With Heuristics and Reinforcement Learning: A Comparative Study

Cao

Zhao

et al. 2020

IEEE Trans. Netw. Serv. Manage.

View full text Add to dashboard Cite

show abstract

“…Deep reinforcement learning describes a class of goalorientated machine learning algorithms taking advantage of powerful function approximators in the context of deep learning [21,22]. Unlike supervised or unsupervised machine learning, these algorithms do not require a dedicated set of training data, since they are designed to learn from experience by interacting with their environment.…”

Section: A Deep Reinforcement Learningmentioning

confidence: 99%

“…A contour that was valid for the initial guess of the function would then have to be re-adjusted. The goal of this study is to provide a proof of principle that a deep reinforcement learning (DRL) agent (see e. g. [21] for a recent review and [22] for a standard text book on the subject) can be trained to conduct the contour deformations as needed. Such an agent could then be used in an iterative setting by deducing the contour deformation from observing the integration plane before each iteration step is conducted.…”

Section: Introductionmentioning

confidence: 99%

Deep reinforcement learning for complex evaluation of one-loop diagrams in quantum field theory

2020

View full text Add to dashboard Cite

In this paper we present a novel technique based on deep reinforcement learning that allows for numerical analytic continuation of integrals that are often encountered in one-loop diagrams in quantum field theory. In order to extract certain quantities of two-point functions, such as spectral densities, mass poles or multi-particle thresholds, it is necessary to perform an analytic continuation of the correlator in question. At one-loop level in Euclidean space, this results in the necessity to deform the integration contour of the loop integral in the complex plane of the square of the loop momentum, in order to avoid non-analyticities in the integration plane. Using a toy model for which an exact solution is known, we train a reinforcement learning agent to perform the required contour deformations. Our study shows great promise for an agent to be deployed in iterative numerical approaches used to compute non-perturbative 2-point functions, such as the quark propagator Dyson-Schwinger equation, or more generally, Fredholm equations of the second kind, in the complex domain.

show abstract

“…The purpose of the present study is to explore how the brain's RL machinery might utilise these opposing properties to achieve complex behavior. Much of the recent success of RL has been due to the combination of classical RL approaches with the function approximation properties of Deep Neural Networks (DNNs), known as deep RL (François-lavet et al 2018). Typically in deep RL, the action-value function Q(s, a) is represented using a DNN that takes the state s t as input and outputs the corresponding action values for that state.…”

Section: Introductionmentioning

confidence: 99%

A complementary learning systems approach to temporal difference learning

Blakeman

Mareschal

2020

Neural Networks

View full text Add to dashboard Cite

Complementary Learning Systems (CLS) theory suggests that the brain uses a 'neocortical' and a 'hippocampal' learning system to achieve complex behavior. These two systems are complementary in that the 'neocortical' system relies on slow learning of distributed representations while the 'hippocampal' system relies on fast learning of pattern-separated representations. Both of these systems project to the striatum, which is a key neural structure in the brain's implementation of Reinforcement Learning (RL). Current deep RL approaches share similarities with a 'neocortical' system because they slowly learn distributed representations through backpropagation in Deep Neural Networks (DNNs). An ongoing criticism of such approaches is that they are data inefficient and lack flexibility. CLS theory suggests that the addition of a 'hippocampal' system could address these criticisms. In the present study we propose a novel algorithm known as Complementary Temporal Difference Learning (CTDL), which combines a DNN with a Self-Organising Map (SOM) to obtain the benefits of both a 'neocortical' and a 'hippocampal' system. Key features of CTDL include the use of Temporal Difference (TD) error to update a SOM and the combination of a SOM and DNN to calculate action values. We evaluate CTDL on grid worlds and the Cart-Pole environment, and show several benefits over the classic Deep Q-Network (DQN) approach. These results demonstrate (1) the utility of complementary learning systems for the evaluation of actions, (2) that the TD error signal is a useful form of communication between the two systems and (3) the biological plausibility of the proposed approach.Preprint. Under review.

show abstract

An Introduction to Deep Reinforcement Learning

Cited by 393 publications

References 0 publications

Multi-Tenant Provisioning for Quantum Key Distribution Networks With Heuristics and Reinforcement Learning: A Comparative Study

Multi-Tenant Provisioning for Quantum Key Distribution Networks With Heuristics and Reinforcement Learning: A Comparative Study

Deep reinforcement learning for complex evaluation of one-loop diagrams in quantum field theory

A complementary learning systems approach to temporal difference learning

Contact Info

Product

Resources

About