Optimal power allocation problem in wireless networks is known to be usually a complex optimization problem. In this paper, we present a simple and energy-efficient distributed power control in downlink Non-Orthogonal Multiple Access (NOMA) using a Reinforcement Learning (RL) based game theoretical approach. A scenario consisting of multiple Base Stations (BSs) serving their respective Near User(s) (NU) and Far User(s) (FU) is considered. The aim of the game is to optimize the achievable rate fairness of the BSs in a distributed manner by appropriately choosing the power levels of the BSs using trials and errors. By resorting to a subtle utility choice based on the concept of marginal price costing where a BS needs to pay a virtual tax offsetting the result of the interference its presence causes for the other BS, we design a potential game that meets the latter objective. As RL scheme, we adopt Learning Automata (LA) due to its simplicity and computational efficiency and derive analytical results showing the optimality and convergence of the game to a Nash Equilibrium (NE). Numerical results not only demonstrate the convergence of the proposed algorithm to a desirable equilibrium maximizing the fairness, but they also demonstrate the correctness of the proposal followed by thorough comparison with random and heuristic approaches.