Energy management strategy (EMS) is the key technology to improving the fuel efficiency of hybrid electric vehicles (HEV). In recent years, the development of artificial intelligence has enabled tremendous advances by utilizing reinforcement learning (RL) for training and deploying deep neural network-based EMS. However, in contrast to the fields of deep learning such as computer vision and natural language processing which mainly rely on large-scale offline datasets, most RL-based policies must be trained online by trial-and-error with the initial performance being almost arbitrary. Such a paradigm is considered inefficient and unsafe for industrial automation and can only be used to tackle the EMS problems in the simulation world. Considering large historical interactive datasets are readily available in EMS domain, if a RL algorithm can be used to extract a policy purely offline from the prior collected dataset and improve upon data logging policy, the current issues including sample inefficiency, unsafe exploration, and simulation-to-real gap that prevent the widespread use of RL methods could be mitigated to a great extent. To this end, this paper presents a feasible algorithmic framework for model-based offline RL. Unlike vanilla RL approaches without any consideration against distributional shift, a data-driven dynamic model is built before the policy training using RL. After that, two techniques namely conservative MDP and state regularization are augmented which is proved to be effective against model over exploitation. A hardware-inthe-loop (HIL) test is used to evaluate the real-time performance and effectiveness of different strategies on a real controller, and it is found that the proposed naïve datadriven model-based offline RL algorithm can already solve the distributional shift problem inherent in vanilla RL to a large extent. Furthermore, by incorporating the guidance of uncertainty-awareness, a near optimal policy can be obtained by using only the dataset from a sub-optimal controller.