2018 IEEE International Conference on Communications (ICC) 2018
DOI: 10.1109/icc.2018.8422710
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning Exploration Algorithms for Energy Harvesting Communications Systems

Abstract: Prolonging the lifetime, and maximizing the throughput are important factors in designing an efficient communications system, especially for energy harvesting-based systems. In this work, the problem of maximizing the throughput of point-to-point energy harvesting communications system, while prolonging its lifetime is investigated. This work considers more real communications system, where this system does not have a priori knowledge about the environment. This system consists of a transmitter and receiver. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 29 publications
(17 citation statements)
references
References 15 publications
0
17
0
Order By: Relevance
“…Ayatollahi et al [139] considered a MIMO system where the transmitter can change the number of antennas during transmission and employed Q-learning to learn the optimal transmission policy. Masadeh et al [140] utilized SARSA algorithm to investigate the exploration and exploitation balancing problem and demonstrated that convergence-based algorithm outperforms the epsilon-greedy algorithm.…”
Section: A Reinforcement Learning Based Communication Optimization Imentioning
confidence: 99%
“…Ayatollahi et al [139] considered a MIMO system where the transmitter can change the number of antennas during transmission and employed Q-learning to learn the optimal transmission policy. Masadeh et al [140] utilized SARSA algorithm to investigate the exploration and exploitation balancing problem and demonstrated that convergence-based algorithm outperforms the epsilon-greedy algorithm.…”
Section: A Reinforcement Learning Based Communication Optimization Imentioning
confidence: 99%
“…Next, at the (m + 1) th iteration, each node attempts to learn the Nash maximizer, F * m+1 . A discrete-time MFG is said to have FPP if and only if the procedure described by (14), (15) and (16) converges.…”
Section: Mf-marl For Distributed Power Controlmentioning
confidence: 99%
“…This is because, the distributed DNN approach does not use the estimate of the distribution π for learning the policy, i.e., in the distributed DNN approach π is used only for sampling the states of the other nodes. Also, from (16), regardless of the update frequency, over the time, iterative estimates of the distribution π converge, and the states of the other nodes are sampled from the correct distribution. In contrast, for the MF-MARL, estimates of π are critically used for learning, therefore the estimation inaccuracies jeopardize the learning procedure, and may adversely affect the sumthroughput.…”
Section: Convergence and Effect Of Hyperparametersmentioning
confidence: 99%
“…One of the promising approaches used is reinforcement learning (RL), which is known as algorithms enabling to optimize system performance in unknown environments [15], [16]. In [13], an EH point-to-point communications system is investigated. The EH and channel gain processes are modeled as Markov processes and Q-learning was used to learn a transmission power allocation policy that maximizes the amount of data arriving at the destination.…”
Section: Introductionmentioning
confidence: 99%