Thermostatically Controlled Loads (TCLs) provide a source of demand flexibility, and are often considered a good source for Demand Response (DR) applications. Due to their heterogeneity, and as such a lack of dynamics models, Reinforcement Learning (RL) is often used to exploit this flexibility. Unfortunately, RL requires exploratory interaction with the TCL, resulting in a period of potential discomfort for the users. We present an approach to reduce this exploratory time by pretraining the RL-agent. Domain randomization is used to facilitate knowledge transfer. We evaluate the pre-training potential in a DR energy arbitrage scenario with an Electric Water Heater (EWH). Our experiments show that a priori knowledge about EWH dynamics can be used to initialize and improve the control policy. In our experiments, pre-training attributes to 8.8 % additional cost savings, compared to starting from scratch.