One of the goals of developing and implementing Industry 4.0 solutions is to significantly increase the level of flexibility and autonomy of production systems. It is intended to provide the possibility of self-reconfiguration of systems to create more efficient and adaptive manufacturing processes. Achieving such goals requires the comprehensive integration of digital technologies with real production processes towards the creation of the so-called Cyber–Physical Production Systems (CPPSs). Their architecture is based on physical and cybernetic elements, with a digital twin as the central element of the “cyber” layer. However, for the responses obtained from the cyber layer, to allow for a quick response to changes in the environment of the production system, its virtual counterpart must be supplemented with advanced analytical modules. This paper proposes the method of creating a digital twin production system based on discrete simulation models integrated with deep reinforcement learning (DRL) techniques for CPPSs. Here, the digital twin is the environment with which the reinforcement learning agent communicates to find a strategy for allocating processes to production resources. Asynchronous Advantage Actor–Critic and Proximal Policy Optimization algorithms were selected for this research.