Prolonging the lifetime, and maximizing the throughput are important factors in designing an efficient communications system, especially for energy harvesting-based systems. In this work, the problem of maximizing the throughput of point-to-point energy harvesting communications system, while prolonging its lifetime is investigated. This work considers more real communications system, where this system does not have a priori knowledge about the environment. This system consists of a transmitter and receiver. The transmitter is equipped with an infinite buffer to store data, and energy harvesting capability to harvest renewable energy and store it in a finite battery. The problem of finding an efficient power allocation policy is formulated as a reinforcement learning problem. Two different exploration algorithms are used, which are the convergence-based and the epsilon-greedy algorithms. The first algorithm uses the action-value function convergence error and the exploration time threshold to balance between exploration and exploitation. On the other hand, the second algorithm tries to achieve balancing through the exploration probability (i.e. epsilon). Simulation results show that the convergence-based algorithm outperforms the epsilon-greedy algorithm. Then, the effects of the parameters of each algorithm are investigated.
Data forwarding from a source to a sink node when they are not within the communication range is a challenging problem in wireless networking. With the increasing demand of wireless networks, several applications have emerged where a group of users are disconnected from their targeted destinations. Therefore, we consider in this paper a multi-Unmanned Aerial Vehicles (UAVs) system to convey collected data from isolated fields to the base station. In each field, a group of sensors or Internet of Things devices are distributed and send their data to one UAV. The UAVs collaborate in forwarding the collected data to the base station in order to maximize the minimum battery level for all UAVs by the end of the service time. Hence, a group of UAVs can meet at a waypoint along their path to the base station such that one UAV collects the data from all other UAVs and moves forward to another meeting point or the base station. All other UAVs that relayed their messages return back to their initial locations. All collected data from all fields reach to the base station within a certain maximum time to guarantee a certain quality of service. We formulate the problem as a Mixed Integer Nonlinear Program (MINLP), then we reformulated the problem as Mixed Integer Linear Program (MILP) after we linearize the mathematical model. Simulations results show the advantages of adopting the proposed model in using the UAVs' energy more efficiently.
Energy harvesting communications systems are able to provide high quality communications services using green energy sources. This paper presents an autonomous energy harvesting communications system that is able to adapt to any environment, and optimize its behavior with experience to maximize the valuable received data. The considered system is a point-to-point energy harvesting communications system consisting of a source and a destination, and working in an unknown and uncertain environment. The source is an energy harvesting node capable of harvesting solar energy and storing it in a finite capacity battery. Energy can be harvested, stored, and used from continuous ranges of energy values. Channel gains can take any value within a continuous range. Since exact information about future channel gains and harvested energy is unavailable, an architecture based on actor-critic reinforcement learning is proposed to learn a close-to-optimal transmission power allocation policy. The actor uses a stochastic parameterized policy to select actions at states stochastically. The policy is modeled by a normal distribution with a parameterized mean and standard deviation. The actor uses policy gradient to optimize the policy's parameters. The critic uses a three layer neural network to approximate the actionvalue function, and to evaluate the optimized policy. Simulation results evaluate the proposed architecture for actor-critic learning, and shows its ability to improve its performance with experience.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.