2018
DOI: 10.1109/tnsm.2018.2863563
|View full text |Cite
|
Sign up to set email alerts
|

Towards 5G: A Reinforcement Learning-Based Scheduling Solution for Data Traffic Management

Abstract: Abstract-Dominated by delay-sensitive and massive data applications, radio resource management in 5G access networks is expected to satisfy very stringent delay and packet loss requirements. In this context, the packet scheduler plays a central role by allocating user data packets in the frequency domain at each predefined time interval. Standard scheduling rules are known limited in satisfying higher Quality of Service (QoS) demands when facing unpredictable network conditions and dynamic traffic circumstance… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
65
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 88 publications
(65 citation statements)
references
References 20 publications
0
65
0
Order By: Relevance
“…We propose to use the actor-critic approach that makes use of two functions: a) value or critic function that keeps track of the value of the states and criticize the actions; b) action-value or actor function that aims to learn over time the best parameters to be applied in each state. As per original definition, the value function V : S → R is determined as follows [19]:…”
Section: Value and Action-value Functionsmentioning
confidence: 99%
See 3 more Smart Citations
“…We propose to use the actor-critic approach that makes use of two functions: a) value or critic function that keeps track of the value of the states and criticize the actions; b) action-value or actor function that aims to learn over time the best parameters to be applied in each state. As per original definition, the value function V : S → R is determined as follows [19]:…”
Section: Value and Action-value Functionsmentioning
confidence: 99%
“…where, R t+1 = r(s, a); (γ t R t+1 ; t ≥ 0) is the accumulated reward value being averaged from state to state by the discount factor γ ∈ [0, 1]; s[0] is considered as random such that P(s[0] = s) > 0 holds for every s ∈ S. The action-value function Q : S × A → R 2 considers in addition that the first action a[0] of the whole process is randomly chosen, and then the function becomes [19]:…”
Section: Value and Action-value Functionsmentioning
confidence: 99%
See 2 more Smart Citations
“…In this paper, we propose a new approach to manage distributed scheduling methods autonomously across multiple cells simultaneously, reducing the requirement for cell planning and increasing the flexibility of scheduling methods in the network. A distributed approach to scheduler management is proposed by Comşa et al [21], where a flexible scheduler uses interchangeable scheduling policies to best serve the QoS requirements of its users. To select from these policies, the authors suggest a reinforcement learning framework, where the QoS offered by each scheduling policy is estimated by an individual neural network.…”
mentioning
confidence: 99%