2018
DOI: 10.1109/jiot.2018.2848295
|View full text |Cite
|
Sign up to set email alerts
|

Handover Control in Wireless Systems via Asynchronous Multiuser Deep Reinforcement Learning

Abstract: In this paper, we propose a two-layer framework to learn the optimal handover (HO) controllers in possibly large-scale wireless systems supporting mobile Internet-of-Things (IoT) users or traditional cellular users, where the user mobility patterns could be heterogeneous. In particular, our proposed framework first partitions the user equipments (UEs) with different mobility patterns into clusters, where the mobility patterns are similar in the same cluster. Then, within each cluster, an asynchronous multi-use… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
78
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 127 publications
(78 citation statements)
references
References 22 publications
0
78
0
Order By: Relevance
“…[25] adopted DRL to learn the power adaption strategy of the primary user in a cognitive network, such that the secondary user is able to adaptively control its power and satisfy the required quality of services of both primary and secondary users. [26] studied the handover problem in a multi-user multi-BS wireless network and proposed a DRL-based handover algorithm to reduce the handover rate of each user under a minimum sum-throughput constraint. In addition, [27] proposed a distributed DRLbased multiple access algorithm to improve the uplink sumthroughput in a multi-user wireless network.…”
Section: B Related Workmentioning
confidence: 99%
“…[25] adopted DRL to learn the power adaption strategy of the primary user in a cognitive network, such that the secondary user is able to adaptively control its power and satisfy the required quality of services of both primary and secondary users. [26] studied the handover problem in a multi-user multi-BS wireless network and proposed a DRL-based handover algorithm to reduce the handover rate of each user under a minimum sum-throughput constraint. In addition, [27] proposed a distributed DRLbased multiple access algorithm to improve the uplink sumthroughput in a multi-user wireless network.…”
Section: B Related Workmentioning
confidence: 99%
“…Therefore, in this paper, we exploit the policy gradient method for policy improvement, which overcomes the limitations of the greedy searching strategy by explicitly optimizing a parameterized policy. Specifically, the policy gradient theorem [24] gives the analytic expression for the gradient of the objective J(π θ ) with respect to the policy parameters θ, written as (21) where d π θ is the state distribution when following policy π θ . Later, Silver et al proposed the off-policy deterministic policy gradient (OPDPG) theorem [25] to optimize a deterministic target policy by following a single stochastic behavior policy, written as…”
Section: Self-organized Drl-based Load Balancingmentioning
confidence: 99%
“…The general idea is to parameterize the Q-functions and derive the optimal values of parameters through policy gradient. In [10], [11], [13], the problems are formulated as multi-agent control with interactions among agents. As a result, experience replay for a single agent cannot be applied in such scenarios.…”
Section: B Applications In Wireless Networkmentioning
confidence: 99%