Energy efficiency and consumption control remain a significant topic in the area of Heating, Ventilation, and Air Conditioning (HVAC) systems. Deep reinforcement learning (DRL) is an emerging technique to optimize energy consumption. Its advantage lies in the ability to tackle the time-series nature of energy data and complexity brought by environmental factors. However, most DRL algorithms have not considered both time-of-use electricity pricing and thermal comfort. This paper proposed a hybrid approach based on twin delayed deep deterministic policy gradient algorithm and model predictive control (TD3-MPC) for HVAC systems, to mitigate function approximation errors and save cost by pre-adjusting building temperatures at off-peak times. This proposed method is compared with deep deterministic policy gradient (DDPG) algorithm under simulations of five building zones. Experiment results demonstrate that TD3-MPC outperforms DDPG algorithm and potentially saves 16% of total energy consumption cost, with better stability and robustness.