Non terrestrial networks (NTN) involving 'in the sky' objects such as low-earth orbit satellites, high altitude platform systems (HAPs) and Unmanned Aerial Vehicles (UAVs) are expected to be integral components of next generation cellular systems. With the deployment of 5G services and beyond, NTNs are leveraged to assist as aerial base stations in providing ubiquitous network connectivity and service to ground users or be deployed as aerial users connected to the cellular network. NTN-aided wireless communication offers multiple benefits such as mobility, flexibility, resistance to ground physical attacks and wide coverage. However, due to their limited resources and the current design of terrestrial cellular systems that do not account for aerial users, and other restrictions such as service requirements, limited available power and storage resources on high-throughput satellites, resource allocation, location of the high altitude platform base station and the flight trajectory of the UAVs need to be intelligently controlled to satisfy various objectives both from an aerial base station and overall network perspectives. To achieve this, many works have explored Reinforcement Learning (RL) techniques to allow aerial platforms in non-terrestrial networks to learn from past observations and achieve some optimal control policy. In this paper and differently from prior surveys, we contribute a comprehensive review of the control objectives required by non-terrestrial platforms that have been solved using RL formulations. We provide an up-to-date overview of the latest applications of RL techniques for different NTN-aided wireless communication aspects. The survey focuses on Markov Decision Process (MDP) formulations in terms of states, actions, and rewards. We synthesize a taxonomy from the surveyed literature and provide a comprehensive representation of the current usages of RL in NTN-aided wireless communications. A qualitative analysis of the level of realism achieved in the works presented in the literature is provided based on several factors that pertain to the simulation environment, station deployment setting, wireless channel assumption, and energy considerations. We also curate a list of challenges that remain to be considered by the research community in order to achieve more efficient deployments and close the simulation-to-reality gap.