Causal Reinforcement Learning (CRL) is an emerging field where two essential areas for the development of artificial intelligence are integrated. Existing works in the area have shown how causality can contribute to mitigate some of the limitations of reinforcement learning (RL), ranging from data-inefficiency, lack of interpretability, and long learning times, among others. However, how to use reinforcement learning to support causal discovery (CD) has so far been less explored. In this article, we introduce CARL, a Causality-Aware Reinforcement Learning framework for simultaneously learning and using causal models to speedup the police learning in online Markov decision process (MDP) settings. In a synergistic way, our method alternates between: (i) (RL for CD), where it promotes the selection of actions to obtain better causal models in fewer episodes than traditional methods of obtaining data in RL, (ii) (CD), where an score-based algorithm is used to learn causal models and (iii) (RL using CD), where the learned models are used to select actions that speed up the learning of the optimal policy by reducing the number of interactions with the environment. Experiments in simulated environments show that our method achieves better results in policy learning than traditional model-free and model-based algorithms while it is also able to learn the underlying causal models. We also show how the learned causal models can be directly transferred to a similar task of greater complexity reducing significantly the number of episodes to learn an optimal policy. Finally, the method's scalability to high-dimensional states, where the action-value function needs to be represented with deep neural networks, was verified.