With increasing electricity prices, cost savings through load shifting are becoming increasingly important for energy end users. While dynamic pricing encourages customers to shift demand to low price periods, the non-stationary and highly volatile nature of electricity prices poses a significant challenge to energy management systems. In this paper, we investigate the flexibility potential of data centres by optimising heating, ventilation, and air conditioning systems with a general modelfree reinforcement learning approach. Since the soft actor-critic algorithm with feed-forward networks did not work satisfactorily in this scenario, we propose instead a parameterisation with a recurrent neural network architecture to successfully handle spot-market price data. The past is encoded into a hidden state, which provides a way to learn the temporal dependencies in the observations and highly volatile rewards. The proposed method is then evaluated in experiments on a simulated data centre. Considering real temperature and price signals over multiple years, the results show a cost reduction compared to a proportional, integral and derivative controller while maintaining the temperature of the data centre within the desired operating ranges. In this context, this work demonstrates an innovative and applicable reinforcement learning approach that incorporates complex economic objectives into agent decision-making. The proposed control method can be integrated into various Internet of things-based smart building solutions for energy management.