In this paper, we consider the problem of minimizing the transmission completion time in energy harvesting devices on time-varying channels with a reinforcement learning approach. Because of the randomness of energy arrival and fading channel in wireless communications, a reinforcement learning algorithm often converges to suboptimal points with a degraded performance. To solve this problem, we first prove that the expected discounted reward sum in the environment is an increasing function of negative time, amount of data sent, channel gain, harvested energy, and remaining battery. We leverage this proof to construct a partially monotonic network that efficiently approximates the optimal action-value function for learning. Experimental results show that our approach with the exploitation of the partial monotonicity of the desired function achieves better performance than existing power allocation policies. Further experiments show that the performance of our learning-based approach is close to the theoretical upper bound over rapidly time-varying channels. Index Terms-energy harvesting communications, transmission completion time minimization, reinforcement learning.