2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) 2018
DOI: 10.1109/iccubea.2018.8697808
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Actor-Critic: Reinforcement Learning Based Radio Resource Scheduling for LTE-Advanced

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(5 citation statements)
references
References 5 publications
0
5
0
Order By: Relevance
“…Most reinforcement learning algorithms need to obtain a proper estimate of the above-mentioned equations (12) and (13). In practice, we usually use the empirical mean return instead of the expected return of the random variable, and to facilitate computer programming, we will use the incremental mean to calculate.…”
Section: Model-free Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Most reinforcement learning algorithms need to obtain a proper estimate of the above-mentioned equations (12) and (13). In practice, we usually use the empirical mean return instead of the expected return of the random variable, and to facilitate computer programming, we will use the incremental mean to calculate.…”
Section: Model-free Methodsmentioning
confidence: 99%
“…Therefore, it is difficult to find the optimal control policy to address the problem of continuous variables. For the reinforcement learning problem of continuous variables, we can use an actor-critic algorithm based on a policy gradient [13]. Notably, a policy gradient belongs to policy-based reinforcement learning.…”
mentioning
confidence: 99%
“…Additionally, the authors in [ 23 ] proposed learning schemes that enable cognitive users to jointly learn their optimal payoffs and strategies for both continuous and discrete actions. The authors in [ 24 ] proposed an actor–critic reinforcement learning scheme for downlink transmission based on radio resource scheduling policy for long term evolution—advanced (LTE-A), to accomplish resource scheduling efficiently by maintaining user fairness and high QoS capabilities. In [ 25 ], the authors proposed a reinforcement learning scheme to optimize routing strategy without human participation.…”
Section: Related Workmentioning
confidence: 99%
“…In ORA, the edge server servers as an agent that iteratively learns to make a right decision to react to the current state, i.e., trying to find an optimal policy, π : S → A, in terms of maximizing a discounted future reward R = T t=0 γ t r t , where T is the time horizon, r t is the immediate reward at time t, and γ ∈ [0, 1] is a discount factor. In this paper, due to the large action space of the joint action (x, y), we employ the actor-critic approach of reinforcement learning with high computational efficiency to achieve the policy [28], where the agent is equipped with two neural networks: actor network and critical network. Note that the actor-critic approach is a combination of Q-learning algorithm and policy gradient algorithm.…”
Section: The Resource Allocation Algorithmmentioning
confidence: 99%