Due to interactions with real users, online reinforcement learning training projects for dialogue agents are expensive. User simulator is an alternative method that is commonly used. However, the environment of a user simulator is not identical to that of a real user, and it cannot provide the atypical and more variegated conversational behavior that is a hallmark of human spontaneity. We employ offline reinforcement learning and Transformer to abstract dialogue policy as a framework for sequence modeling problems, modeling the joint distribution of state, action, and reward sequences to generate optimal dialogue actions. An evaluation of the Multiwoz dataset shows that DT successfully improves the efficiency of DRL dialogue agents and improves dialogue robustness.