Dynamic Actor-Critic: Reinforcement Learning Based Radio Resource Scheduling for LTE-Advanced

Tathe, Pallavi K.; Sharma, Manish

doi:10.1109/iccubea.2018.8697808

Cited by 15 publications

(5 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most reinforcement learning algorithms need to obtain a proper estimate of the above-mentioned equations (12) and (13). In practice, we usually use the empirical mean return instead of the expected return of the random variable, and to facilitate computer programming, we will use the incremental mean to calculate.…”

Section: Model-free Methodsmentioning

confidence: 99%

See 1 more Smart Citation

An Improved Sarsa($\lambda$ ) Reinforcement Learning Algorithm for Wireless Communication Systems

et al. 2019

View full text Add to dashboard Cite

In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average action value of all possible successive actions, and apply eligibility traces to record the historical access of every state action pair, which greatly improve the model's convergence property and learning efficiency. Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in the tabular case of a finite Markov decision process, thereby providing an efficient solution for the study and design wireless communication networks. This provides an efficient and effective solution to design further artificial intelligent communication networks. INDEX TERMS Model-free reinforcement learning, Sarsa, Q learning, eligibility traces. I. INTRODUCTION

show abstract

Section: Model-free Methodsmentioning

confidence: 99%

“…Therefore, it is difficult to find the optimal control policy to address the problem of continuous variables. For the reinforcement learning problem of continuous variables, we can use an actor-critic algorithm based on a policy gradient [13]. Notably, a policy gradient belongs to policy-based reinforcement learning.…”

mentioning

confidence: 99%

An Improved Sarsa($\lambda$ ) Reinforcement Learning Algorithm for Wireless Communication Systems

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Additionally, the authors in [ 23 ] proposed learning schemes that enable cognitive users to jointly learn their optimal payoffs and strategies for both continuous and discrete actions. The authors in [ 24 ] proposed an actor–critic reinforcement learning scheme for downlink transmission based on radio resource scheduling policy for long term evolution—advanced (LTE-A), to accomplish resource scheduling efficiently by maintaining user fairness and high QoS capabilities. In [ 25 ], the authors proposed a reinforcement learning scheme to optimize routing strategy without human participation.…”

Section: Related Workmentioning

confidence: 99%

D2D Mobile Relaying Meets NOMA—Part II: A Reinforcement Learning Perspective

Driouech

Sabir

Ghogho

et al. 2021

Sensors

View full text Add to dashboard Cite

Structureless communications such as Device-to-Device (D2D) relaying are undeniably of paramount importance to improving the performance of today’s mobile networks. Such a communication paradigm requires a certain level of intelligence at the device level, thereby allowing it to interact with the environment and make proper decisions. However, decentralizing decision-making may induce paradoxical outcomes, resulting in a drop in performance, which sustains the design of self-organizing yet efficient systems. We propose that each device decides either to directly connect to the eNodeB or get access via another device through a D2D link. In the first part of this article, we describe a biform game framework to analyze the proposed self-organized system’s performance, under pure and mixed strategies. We use two reinforcement learning (RL) algorithms, enabling devices to self-organize and learn their pure/mixed equilibrium strategies in a fully distributed fashion. Decentralized RL algorithms are shown to play an important role in allowing devices to be self-organized and reach satisfactory performance with incomplete information or even under uncertainties. We point out through a simulation the importance of D2D relaying and assess how our learning schemes perform under slow/fast channel fading.

show abstract

“…In ORA, the edge server servers as an agent that iteratively learns to make a right decision to react to the current state, i.e., trying to find an optimal policy, π : S → A, in terms of maximizing a discounted future reward R = T t=0 γ t r t , where T is the time horizon, r t is the immediate reward at time t, and γ ∈ [0, 1] is a discount factor. In this paper, due to the large action space of the joint action (x, y), we employ the actor-critic approach of reinforcement learning with high computational efficiency to achieve the policy [28], where the agent is equipped with two neural networks: actor network and critical network. Note that the actor-critic approach is a combination of Q-learning algorithm and policy gradient algorithm.…”

Section: The Resource Allocation Algorithmmentioning

confidence: 99%

Delay-aware Resource Allocation in Fog-assisted IoT Networks Through Reinforcement Learning

Fan¹,

Bai²,

Zhang³

et al. 2020

Preprint

View full text Add to dashboard Cite

Fog nodes in the vicinity of IoT devices are promising to provision low latency services by offloading tasks from IoT devices to them. Mobile IoT is composed by mobile IoT devices such as vehicles, wearable devices and smartphones. Owing to the time-varying channel conditions, traffic loads and computing loads, it is challenging to improve the quality of service (QoS) of mobile IoT devices. As task delay consists of both the transmission delay and computing delay, we investigate the resource allocation (i.e., including both radio resource and computation resource) in both the wireless channel and fog node to minimize the delay of all tasks while their QoS constraints are satisfied. We formulate the resource allocation problem into an integer non-linear problem, where both the radio resource and computation resource are taken into account. As IoT tasks are dynamic, the resource allocation for different tasks are coupled with each other and the future information is impractical to be obtained. Therefore, we design an on-line reinforcement learning algorithm to make the sub-optimal decision in real time based on the historical data. The performance of the designed algorithm has been demonstrated by extensive simulation results.

show abstract

Dynamic Actor-Critic: Reinforcement Learning Based Radio Resource Scheduling for LTE-Advanced

Cited by 15 publications

References 5 publications

An Improved Sarsa($\lambda$ ) Reinforcement Learning Algorithm for Wireless Communication Systems

An Improved Sarsa($\lambda$ ) Reinforcement Learning Algorithm for Wireless Communication Systems

D2D Mobile Relaying Meets NOMA—Part II: A Reinforcement Learning Perspective

Delay-aware Resource Allocation in Fog-assisted IoT Networks Through Reinforcement Learning

Contact Info

Product

Resources

About